neulab / xnmt

eXtensible Neural Machine Translation
Other
185 stars 44 forks source link

Switchable PyTorch backend #581

Closed msperber closed 4 years ago

msperber commented 4 years ago

This addresses #420 and implements switchable DyNet / Pytorch backends for XNMT. Different backends have different advantages, such as autobatching in DyNet vs. multi GPU support, mixed precision training, CTC training in Pytorch, both of which potentially critical in certain situations. Another motivation is that it can be easier to replicate prior work when using the same deep learning framework.

All technical details are described in the updated doc, so please take a look there. I did my best to keep the changes as unobtrusive as possible, which was relatively easy given the similar design principles of DyNet and Pytorch. Switchable backends imply somewhat increased maintenance effort for some of the core modeling code, although this code is fairly stable now so I think things should be fine in this respect. For advanced features, I don’t think we need to aim for keeping things in parallel.

The status is as follows:

There is one minor breaking change: saved model files now use a dash instead of a period, e.g. “Linear.9c2beb79” -> “Linear-9c2beb79”. This is because Pytorch complains when model names contain a period. When using old saved models, these would need to be manually renamed.

One potential question that might be raised about the chosen design is why DyNet and Pytorch code are mixed in the same Python modules, as opposed to having clean separate modules for each. The main reason for this is to allow for clean implementation of default components. For example, DefaultTranslator is backend-independent, and uses bare(embedders.SimpleWordEmbedder) as default for it’s src_embedder init argument. embedders.SimpleWordEmbedder has two different implementations, embedders.SimpleWordEmbedderDynet and embedders.SimpleWordEmbedderTorch. embedders.SimpleWordEmbedder will point to the appropriate one given the active backend. Moving both implementations to different modules would require importing things from the base module, leading to circular imports (e.g., xnmt.embedders and xnmt.embedders_dynet would both import each other). Nevertheless, I did make sure that running with either backend works even without the other backend installed in the Python environment.

There are a few extra changes and fixes that are not central to the PyTorch backend, but were very helpful for debugging and unit testing:

— Matthias

neubig commented 4 years ago

Thanks so much for this! Could you check the integration tests?

msperber commented 4 years ago

Ah, sure. I think I know where these come from & will try to fix.

msperber commented 4 years ago

I've pushed the fix, sorry for the delay.

neubig commented 4 years ago

Thanks a bunch! I am not going to be able to review this in detail in a timely manner, but I've looked at the overall structure and it looks good. I'll just go ahead and merge, and we can iterate on any further improvements.