Closed msperber closed 4 years ago
Thanks so much for this! Could you check the integration tests?
Ah, sure. I think I know where these come from & will try to fix.
I've pushed the fix, sorry for the delay.
Thanks a bunch! I am not going to be able to review this in detail in a timely manner, but I've looked at the overall structure and it looks good. I'll just go ahead and merge, and we can iterate on any further improvements.
This addresses #420 and implements switchable DyNet / Pytorch backends for XNMT. Different backends have different advantages, such as autobatching in DyNet vs. multi GPU support, mixed precision training, CTC training in Pytorch, both of which potentially critical in certain situations. Another motivation is that it can be easier to replicate prior work when using the same deep learning framework.
All technical details are described in the updated doc, so please take a look there. I did my best to keep the changes as unobtrusive as possible, which was relatively easy given the similar design principles of DyNet and Pytorch. Switchable backends imply somewhat increased maintenance effort for some of the core modeling code, although this code is fairly stable now so I think things should be fine in this respect. For advanced features, I don’t think we need to aim for keeping things in parallel.
The status is as follows:
There is one minor breaking change: saved model files now use a dash instead of a period, e.g. “Linear.9c2beb79” -> “Linear-9c2beb79”. This is because Pytorch complains when model names contain a period. When using old saved models, these would need to be manually renamed.
One potential question that might be raised about the chosen design is why DyNet and Pytorch code are mixed in the same Python modules, as opposed to having clean separate modules for each. The main reason for this is to allow for clean implementation of default components. For example,
DefaultTranslator
is backend-independent, and usesbare(embedders.SimpleWordEmbedder)
as default for it’ssrc_embedder
init argument.embedders.SimpleWordEmbedder
has two different implementations,embedders.SimpleWordEmbedderDynet
andembedders.SimpleWordEmbedderTorch
.embedders.SimpleWordEmbedder
will point to the appropriate one given the active backend. Moving both implementations to different modules would require importing things from the base module, leading to circular imports (e.g.,xnmt.embedders
andxnmt.embedders_dynet
would both import each other). Nevertheless, I did make sure that running with either backend works even without the other backend installed in the Python environment.There are a few extra changes and fixes that are not central to the PyTorch backend, but were very helpful for debugging and unit testing:
— Matthias