Switchable PyTorch backend

msperber commented 4 years ago

This addresses #420 and implements switchable DyNet / Pytorch backends for XNMT. Different backends have different advantages, such as autobatching in DyNet vs. multi GPU support, mixed precision training, CTC training in Pytorch, both of which potentially critical in certain situations. Another motivation is that it can be easier to replicate prior work when using the same deep learning framework.

All technical details are described in the updated doc, so please take a look there. I did my best to keep the changes as unobtrusive as possible, which was relatively easy given the similar design principles of DyNet and Pytorch. Switchable backends imply somewhat increased maintenance effort for some of the core modeling code, although this code is fairly stable now so I think things should be fine in this respect. For advanced features, I don’t think we need to aim for keeping things in parallel.

The status is as follows:

most example configs are supported with both backends (with exception of a few advanced features: 17_minrisk, 18_lexiconbias, 21_char_segment; these are not implemented with Pytorch backend)
Most unit tests run with both backends. Those that don’t support the Pytorch backend are skipped automatically in case of this backend.
I did comprehensive checks of activations, gradients, and updates, as well as complete training curves, to confirm that both backends perform the same computations (modulo numerical stability)
the 3 recipes are tested and produce similar results with both backends.
the speed is more or less similar with both backends. The Pytorch backend needs less GPU memory and introduces a new CUDNN-based LSTM though, which has less features but gives significantly higher speed.
DyNet-trained models can be loaded with the Pytorch backend and evaluated or finetuned from there. The opposite direction is currently not implemented, as reading of serialized Pytorch models is less straightforward.

There is one minor breaking change: saved model files now use a dash instead of a period, e.g. “Linear.9c2beb79” -> “Linear-9c2beb79”. This is because Pytorch complains when model names contain a period. When using old saved models, these would need to be manually renamed.

One potential question that might be raised about the chosen design is why DyNet and Pytorch code are mixed in the same Python modules, as opposed to having clean separate modules for each. The main reason for this is to allow for clean implementation of default components. For example, DefaultTranslator is backend-independent, and uses bare(embedders.SimpleWordEmbedder) as default for it’s src_embedder init argument. embedders.SimpleWordEmbedder has two different implementations, embedders.SimpleWordEmbedderDynet and embedders.SimpleWordEmbedderTorch. embedders.SimpleWordEmbedder will point to the appropriate one given the active backend. Moving both implementations to different modules would require importing things from the base module, leading to circular imports (e.g., xnmt.embedders and xnmt.embedders_dynet would both import each other). Nevertheless, I did make sure that running with either backend works even without the other backend installed in the Python environment.

There are a few extra changes and fixes that are not central to the PyTorch backend, but were very helpful for debugging and unit testing:

loss reports were incorrect with the “avg” loss_comb_method, and tensorboard logging step counters were not working correctly.
a new —settings=pretend mode that runs training / evaluation on 1 input and then finishes (useful to quickly make sure everything runs smoothly, as a sanity check before launching a long training)
extended tensorboard support
more flexible parameter initialization, especially regarding components with multiple param matrices, and direct initialization to given numpy arrays
a few other minor details

— Matthias

neubig commented 4 years ago

Thanks so much for this! Could you check the integration tests?

msperber commented 4 years ago

Ah, sure. I think I know where these come from & will try to fix.

msperber commented 4 years ago

I've pushed the fix, sorry for the delay.

neubig commented 4 years ago

Thanks a bunch! I am not going to be able to review this in detail in a timely manner, but I've looked at the overall structure and it looks good. I'll just go ahead and merge, and we can iterate on any further improvements.

neulab / xnmt

Switchable PyTorch backend #581