uclnlp / jack

Jack the Reader
MIT License
257 stars 82 forks source link

Pytorch #301

Closed dirkweissenborn closed 6 years ago

dirkweissenborn commented 6 years ago

254

pytorch integration. Main changes made:

TODO:

Example

The following training works:

$ bin/jack-train.py with config='./conf/fastqa.yaml' model=fastqa_reader_torch

It reuses the same input and output module as fastqa_reader and is about 1.5x as fast as the TF implementation.

dirkweissenborn commented 6 years ago

@riedelcastro @TimDettmers could you take a look at this?

dirkweissenborn commented 6 years ago

@pminervini how can I add pytorch as a requirement such that it plays well together with setuptools? For pytorch we only get a link to whl at the moment here. I tried adding the following to requirements.txt (which works fine with pip install -r):

http://download.pytorch.org/whl/cu75/torch-0.2.0.post3-cp35-cp35m-manylinux1_x86_64.whl ; sys_platform == "linux"
http://download.pytorch.org/whl/torch-0.2.0.post3-cp35-cp35m-macosx_10_7_x86_64.whl ; sys_platform == "darwin"
codecov-io commented 6 years ago

Codecov Report

Merging #301 into master will decrease coverage by 7.45%. The diff coverage is 24.19%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #301      +/-   ##
==========================================
- Coverage   60.77%   53.32%   -7.46%     
==========================================
  Files          75       85      +10     
  Lines        4352     4923     +571     
==========================================
- Hits         2645     2625      -20     
- Misses       1707     2298     +591
Impacted Files Coverage Δ
jack/core/model_module.py 75.75% <ø> (+0.34%) :arrow_up:
jack/torch_util/segment.py 0% <0%> (ø)
jack/torch_util/highway.py 0% <0%> (ø)
jack/core/torch.py 0% <0%> (ø)
jack/torch_util/xqa.py 0% <0%> (ø)
jack/readers/extractive_qa/torch/fastqa.py 0% <0%> (ø)
jack/torch_util/embedding.py 0% <0%> (ø)
jack/torch_util/rnn.py 0% <0%> (ø)
jack/torch_util/misc.py 0% <0%> (ø)
jack/readers/extractive_qa/tensorflow/fastqa.py 91.17% <100%> (ø)
... and 26 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 28c1d13...0f03066. Read the comment docs.

TimDettmers commented 6 years ago

I skimmed the code and it looks okay. I would be nice if you would work more with template method patterns, it seems as made for the job: Implement abstract classes that implement the algorithm and defer tensorflow/pytorch specifics to their subclasses. This would mean having no specific ModelModule etc. but only one general one. If we add another model we also only need to implement it once instead of twice for tensorflow and pytorch, which would be a big bonus.

This would be much more work but once it stands you never have to change the abstract classes, and it is not necessary to implement all the pytorch code. This way you have stubs like conv_char_embed(...) or any other function which people can implement if they want to have their favorite model in pytorch.

This way any of us does not need to learn the pytorch abstractions, they just need to look which stub throws the NotImplemented error and just implement that stub.

This would also make it trivial to extend the framework to more than two languages since you do not have to duplicate code, ModelModules etc.

On the other hand, we want to push this to the public soon and the added time might prevent this from having pytorch support at all if we choose the template pattern approach. However, if we do not do this now, it will be almost impossible to refactor and might hinder progress in this area in the future.

dirkweissenborn commented 6 years ago

@TimDettmers Thanks for the input!

Do I understand correctly that you want to implement our models independent of framework? If yes, I think this is not a good idea because we would have to generalize all functionality of TF and PyTorch in jack, right? This is really problematic actually because they are not easy to align with each other in some cases. It is also hard to generalize the 2 workflows more than we have at the moment in the ModelModules and JTReaders, because TF is static and pyTorch is dynamic.

I think having the option to use either pyTorch or TF to implement your model is the big plus, not that you can run your model in either pyTorch or TF. Now we have 2 implementations of FastQA as an example but normally we should only have one implementation of a model in a single of the supported frameworks because that is enough to use Jack for training/evaluation/application of that reader.

TimDettmers commented 6 years ago

Indeed it would be an issue because you have to unify many operations. However, I do not think we would need to do this for any operation. Something like LSTM, char embeddings, self-attention are well-defined building blocks. For things like attention and biattentions there are more variants, but I think it would still be possible. If implemented on this level it might be possible. But overall I agree this would be a mess.

One issue might also be how to access the input variables if you go with this approach since you have to unify pytorch and tensorflow variables.

If I think about it again, this approach feels more like reimplementing Keras rather than an easy model for our framework. But maybe at some places, this makes sense still.

pminervini commented 6 years ago

@pminervini how can I add pytorch as a requirement such that it plays well together with setuptools? For pytorch we only get a link to whl at the moment here.

@dirkweissenborn , what about adding torch to requirements.txt ?

https://github.com/pytorch/examples/blob/master/imagenet/requirements.txt

dirkweissenborn commented 6 years ago

@pminervini there is no support for this yet.