kiudee / cs-ranking

Context-sensitive ranking and choice in Python with PyTorch
https://cs-ranking.readthedocs.io
Apache License 2.0
66 stars 15 forks source link

PyTorch migration: Remove tensorflow components, add FATE estimators #164

Closed timokau closed 3 years ago

timokau commented 4 years ago

Description

See this comment for a description of the current status.

Motivation and Context

Tensorflow 1 is deprecated and we need to move away from it. This PR is an attempt to evaluate pytorch as an alternative. ~For now I don't try to fit the existing API (at least not yet).~

How Has This Been Tested?

Lints & tests.

Does this close/impact existing issues?

Types of changes

Checklist:

kiudee commented 4 years ago

Already looks very clean - well done!

timokau commented 4 years ago

Just as a little status update, since this has been going on for a while: I think this is turning out quite nicely. I have implemented FETA ranking and (nearly, not using the proper loss function yet) FETA discrete choice. That shows flexibility on one axis (result type). I plan to also implement the same for FATE to show the flexibility on the other axis and finish the proof of concept. At that point we could evaluate and see where to take it from there.

So in summary, things are moving along but are not quite ready for review/discussion yet. Hopefully soon-ish.

timokau commented 4 years ago

Okay, I think this is sufficient as a proof of concept now. I have implemented FATE and FETA, each in the ranking and discrete choice variant.

I replaced a lot of the inheritance in the current tensorflow implementation with composition. I have split the code into "scoring modules" and estimators. The scoring modules are themselves composed of smaller modules, which makes them easier to reuse/understand/test.

I have based the estimator implementation on skorch, which takes care of a lot of the boilerplate for us. We no longer have to care about training loops, instantiating optimizers or passing the parameters to uninitialized classes. We get #116 basically for free.

The actual "heavy lifing" of the computation (the pairwise utilities) is disentangled from the FETA/FATE architecture (the "data flow" part), so its easy to modify or replace. For now its just a simple 1-layer linear network. This decomposition of scorer/estimator/utility removes a lot of duplication. It would be very easy to add a new scorer (for example based on graph neural networks) and "throw" it at the existing Ranking/Discrete choice estimators. It would also be very easy to derive a new utility function architecture and "throw" that at the FATE module.

If you want to look at the implementation, here are the most interesting files:

What do you think @kiudee? There are still things to improve of course, but I think its sufficient as a proof of concept.

timokau commented 4 years ago

Also CC @prithagupta if you are interested in this.

timokau commented 4 years ago

What is your general verdict @kiudee? Should I continue down this path, implementing more of the existing learners and functionality and eventually replacing the current implementation? Or rather try something else?

timokau commented 4 years ago

Another status update: I'm experimenting with experiments. We should be able to reproduce the experiments of the main papers with the new implementation, and I'd like to be able to do that in an easily reproducible way (that could possibly be repeated on each release). I'm trying to use Sacred for the purpose. I'm abusing the "named configuration" system a bit, but currently you can do things like

python3 experiment.py -m sacred with feta_variable_choice_estimator pareto_choice_problem dataset_params.n_instances=10000

you can pick "named configurations" for an estimator and a dataset. You can then overwrite all parameters on the command line. Sacred will run the experiment, store everything that is needed to reproduce it and also store the results in a database: 2020-11-24-212138_1918x1058_scrot

timokau commented 3 years ago

Some more progress: I've added some metrics and played with the experiments a bit. Here I was trying to see how far I could push the current feta implementation with its defaults and just 1000 pareto instances (which was further than expected): training plot I ended up stopping the training even though the informedness still seemed to be rising very slightly. I ran the experiment with

python3 poc/experiment.py -m sacred with feta_variable_choice_estimator pareto_choice_problem dataset_params.n_instances=1000 dataset_params.n_objects=30

I also created an upstream PR for the Sacred logger for skorch: https://github.com/skorch-dev/skorch/pull/725

timokau commented 3 years ago

I have started to remove the tensorflow components. Next steps would include cleanup and integration of the pytorch components (currently not included in this branch anymore) into the main project. We could then merge this PR when it is ready, making the in-progress transition official. Many of the removed components could then be re-introduced / re-written in follow-up PRs.

timokau commented 3 years ago

I think this is ready for another pair of eyes now. In its current state, the PR

It basically "sets the stage" for the remaining estimators. I plan to add FETA, CmpNet and RankNet in follow-up PRs. The focus of this PR is on the structural and architectural aspects. The FATE estimators mostly serve as an example.

Many components that were part of the "proof of concept" at some point are removed now. That includes

Other things to note:

Open questions:

timokau commented 3 years ago

I noticed two more things (an issue with the documentation and a difference in test configuration) while working on FETA. I have pushed updates.

Please have a look at the test configuration commit. I have adjusted it to match the old behavior for now, but we may want to disable validation for tests instead. It is especially odd for the ranking test. I can see one argument in favor of the validation split: The split behavior is at least exercised in the tests. That could catch obvious exception-generating errors. Skorch and its test suite probably does a better job at testing this though.

Edit: It turns out that the ranking test fails with the validation split. I assumed that validation would simply be a no-op without any validation data. Apparently I forgot to run the tests.

timokau commented 3 years ago

Very good modular structure.

Thanks again :)

The only thing which should be changed is that all hyperparameters are available on the level of the learners (the level which implements the sklearn interface), as we talked about.

I have just pushed a change that should address this. I agree that the hyperparameters were very inconvenient to configure. It was technically possible, but not as easy as it should be. The new approach corresponds to the alternative I have mentioned in the second open question here.

timokau commented 3 years ago

I have fixed the copy and paste issue that you found.

Just for completeness, I'll summarize the results of our private discussions here too:

The PR is ready for review again.

Edit: I forgot to mention the module names. We also discussed alternatives for the names of the first_order and zeroth_order module. In the end I decided to go with instance_reduction and object_mapping.

timokau commented 3 years ago

Thank you for the reviews :)

timokau commented 3 years ago

I wanted to run the checks and lints once more before merging and noticed some formatting issues. I did not run black consistently because I had a newer version in my environment and that made a lot of unrelated formatting changes. I have fixed the formatting with the black version that is defined in the .pre-commt-config.yaml. The linters (including black) give a thumbs-up now and the test suite passes.

Please take another look.

timokau commented 3 years ago

Thanks again :rocket: