PyTorch migration: Remove tensorflow components, add FATE estimators

timokau commented 4 years ago

Description

See this comment for a description of the current status.

Motivation and Context

Tensorflow 1 is deprecated and we need to move away from it. This PR is an attempt to evaluate pytorch as an alternative. ~For now I don't try to fit the existing API (at least not yet).~

How Has This Been Tested?

Lints & tests.

Does this close/impact existing issues?

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[x] Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

[x] My code follows the code style of this project.
[x] My change requires a change to the documentation.
[x] I have updated the documentation accordingly.
[x] I have added tests to cover my changes.

kiudee commented 4 years ago

Already looks very clean - well done!

timokau commented 4 years ago

Just as a little status update, since this has been going on for a while: I think this is turning out quite nicely. I have implemented FETA ranking and (nearly, not using the proper loss function yet) FETA discrete choice. That shows flexibility on one axis (result type). I plan to also implement the same for FATE to show the flexibility on the other axis and finish the proof of concept. At that point we could evaluate and see where to take it from there.

So in summary, things are moving along but are not quite ready for review/discussion yet. Hopefully soon-ish.

timokau commented 4 years ago

Okay, I think this is sufficient as a proof of concept now. I have implemented FATE and FETA, each in the ranking and discrete choice variant.

I replaced a lot of the inheritance in the current tensorflow implementation with composition. I have split the code into "scoring modules" and estimators. The scoring modules are themselves composed of smaller modules, which makes them easier to reuse/understand/test.

I have based the estimator implementation on skorch, which takes care of a lot of the boilerplate for us. We no longer have to care about training loops, instantiating optimizers or passing the parameters to uninitialized classes. We get #116 basically for free.

The actual "heavy lifing" of the computation (the pairwise utilities) is disentangled from the FETA/FATE architecture (the "data flow" part), so its easy to modify or replace. For now its just a simple 1-layer linear network. This decomposition of scorer/estimator/utility removes a lot of duplication. It would be very easy to add a new scorer (for example based on graph neural networks) and "throw" it at the existing Ranking/Discrete choice estimators. It would also be very easy to derive a new utility function architecture and "throw" that at the FATE module.

If you want to look at the implementation, here are the most interesting files:

modules/scoring.py takes care of the high level assembly for the FATE and FETA scoring modules.
estimators.py derives ranking and discrete choice estimators from the scorers.
*_losses.py, *_datasets.py defines some losses and test datasets. modules/feta_support.py and modules/embedding.py are contain the more low-level aspects of FETA and FATE.

What do you think @kiudee? There are still things to improve of course, but I think its sufficient as a proof of concept.

timokau commented 4 years ago

Also CC @prithagupta if you are interested in this.

timokau commented 4 years ago

What is your general verdict @kiudee? Should I continue down this path, implementing more of the existing learners and functionality and eventually replacing the current implementation? Or rather try something else?

timokau commented 4 years ago

Another status update: I'm experimenting with experiments. We should be able to reproduce the experiments of the main papers with the new implementation, and I'd like to be able to do that in an easily reproducible way (that could possibly be repeated on each release). I'm trying to use Sacred for the purpose. I'm abusing the "named configuration" system a bit, but currently you can do things like

python3 experiment.py -m sacred with feta_variable_choice_estimator pareto_choice_problem dataset_params.n_instances=10000

you can pick "named configurations" for an estimator and a dataset. You can then overwrite all parameters on the command line. Sacred will run the experiment, store everything that is needed to reproduce it and also store the results in a database: 2020-11-24-212138_1918x1058_scrot

timokau commented 3 years ago

Some more progress: I've added some metrics and played with the experiments a bit. Here I was trying to see how far I could push the current feta implementation with its defaults and just 1000 pareto instances (which was further than expected): training plot I ended up stopping the training even though the informedness still seemed to be rising very slightly. I ran the experiment with

python3 poc/experiment.py -m sacred with feta_variable_choice_estimator pareto_choice_problem dataset_params.n_instances=1000 dataset_params.n_objects=30

I also created an upstream PR for the Sacred logger for skorch: https://github.com/skorch-dev/skorch/pull/725

timokau commented 3 years ago

I have started to remove the tensorflow components. Next steps would include cleanup and integration of the pytorch components (currently not included in this branch anymore) into the main project. We could then merge this PR when it is ready, making the in-progress transition official. Many of the removed components could then be re-introduced / re-written in follow-up PRs.

timokau commented 3 years ago

I think this is ready for another pair of eyes now. In its current state, the PR

removes all tensorflow components (while keeping the testsuite and the linters happy),
adds the "structural components" for pytorch based estimators (using skorch as an intermediary) and
adds a pytorch implementation of FATE scoring and the derived estimators.

It basically "sets the stage" for the remaining estimators. I plan to add FETA, CmpNet and RankNet in follow-up PRs. The focus of this PR is on the structural and architectural aspects. The FATE estimators mostly serve as an example.

Many components that were part of the "proof of concept" at some point are removed now. That includes

the dataset (the current implementation re-uses the datasets that are already defined in cs-ranking),
the experiments (these may still be valuable at some point, but are not fleshed out enough to be included in the package in their current state) and
the FETA implementation.

Other things to note:

The choice function threshold is currently fixed. Optimizing the threshold is not supported.
The interactive notebooks were removed. We may want to adapt some of them to the new pytorch estimators at some point, but that is not currently part of this PR.

Open questions:

To which degree do we want to preserve the commit history? I have already done some rebasing and cleanup. Currently there is one commit that adds the "proof of concept implementations" (a slightly modified version of what I previously developed in this PR) and then a lot of other commits that integrate the pytorch parts into the library. Some parts are restructured, others are removed. For example the experiments and FETA are still in the commit history. We could restructure that a bit to get rid of the "unpolished" intermediate states. We would then have commits like "Add skorch based estimator superclasses", "Add FATE scoring module"). That should not be too much effort. We could also keep it as it is, which adds some context but keeps those unpolished intermediate states.
At what level of abstraction should the "specialized" estimator classes such as FATEDiscreteChoiceFunction sit? Currently they are very low-level, which gives the most flexibility (overriding the scoring module, overriding some aspect of the optimizer etc.) but makes it a bit inconvenient to specify common parameters such as n_hidden_set_layers. An alternative would be to add special parameters for those and configure the components accordingly. If somebody wanted full flexibility, they could then use the classes "up the chain" (SkorchDiscreteChoiceFunction and FATEScoring) to build their own estimator. I think I prefer that alternative now, but both have their benefits. If you agree then we could also make that change in a follow-up PR.

timokau commented 3 years ago

I noticed two more things (an issue with the documentation and a difference in test configuration) while working on FETA. I have pushed updates.

Please have a look at the test configuration commit. I have adjusted it to match the old behavior for now, but we may want to disable validation for tests instead. It is especially odd for the ranking test. I can see one argument in favor of the validation split: The split behavior is at least exercised in the tests. That could catch obvious exception-generating errors. Skorch and its test suite probably does a better job at testing this though.

Edit: It turns out that the ranking test fails with the validation split. I assumed that validation would simply be a no-op without any validation data. Apparently I forgot to run the tests.

timokau commented 3 years ago

Very good modular structure.

Thanks again :)

The only thing which should be changed is that all hyperparameters are available on the level of the learners (the level which implements the sklearn interface), as we talked about.

I have just pushed a change that should address this. I agree that the hyperparameters were very inconvenient to configure. It was technically possible, but not as easy as it should be. The new approach corresponds to the alternative I have mentioned in the second open question here.

timokau commented 3 years ago

I have fixed the copy and paste issue that you found.

Just for completeness, I'll summarize the results of our private discussions here too:

I have removed the poetry2nix specific workaround.
I have restructured and reduced the history to avoid adding components just to remove them again in the same PR (related to the first open question here). Sorry for all the email notifications and force pushes.
I have "lifted" the level of abstraction of the specialized estimator classes (related to the second open question here).
I have removed the train/validation split in the tests (related to this comment).

The PR is ready for review again.

Edit: I forgot to mention the module names. We also discussed alternatives for the names of the first_order and zeroth_order module. In the end I decided to go with instance_reduction and object_mapping.

timokau commented 3 years ago

Thank you for the reviews :)

timokau commented 3 years ago

I wanted to run the checks and lints once more before merging and noticed some formatting issues. I did not run black consistently because I had a newer version in my environment and that made a lot of unrelated formatting changes. I have fixed the formatting with the black version that is defined in the .pre-commt-config.yaml. The linters (including black) give a thumbs-up now and the test suite passes.

Please take another look.

timokau commented 3 years ago

Thanks again :rocket:

kiudee / cs-ranking