Revisit Integration of AllenNLP Models/Modules in jiant

Issue by zphang Monday Mar 23, 2020 at 17:31 GMT Originally opened as https://github.com/nyu-mll/jiant/issues/1043

AllenNLP has split off some models into a separate repository: https://github.com/allenai/allennlp-models
There is some benefit to maintaining continuing compatibility with AllenNLP models to be able to quickly take advantage of any new tasks/models that the AllenAI group build
It is hard to connect AllenNLP-specific models to jiant (for example, in the forward function, we necessitate the out to have out[“logits”] for the update scoring method, which the forward function in AllenNLP don’t have. Thus, we would at the very least need to make a wrapper around AllenNLP models, but in that case it might be easier to just copy-paste from AllenNLP and modify it to make it compatible with jiant.
On the other hand, it is much easier to take advantage of metrics/modules from AllenNLP, that do not require a full model

Case Study #1: SQuAD / QA Model

https://github.com/allenai/allennlp-models/blob/5f9a903dc8586eb99d39124228d0bdd58765a29c/allennlp_models/rc/transformer_qa/transformer_qa_model.py#L25
It is not immediately clear how to integrate a full-fledged AllenNLP model into the current jiant super-model. An AllenNLP model is more or less defined end-to-end (including the text_field_embedder, so using it will require some tricky wrapping of both the inputs (from us) and outputs (to our loss/metric processing) to work. See above.
In this case, the model head itself is fairly trivial (just a linear layer), but the forward method is complex. Part of the forward method also includes the metric computation, which we are may not use directly (since our metrics belong to the Tasks, whereas theirs belongs to the model).

How do we feel about the balance of the benefit of using AllenNLP models, with the downside of needing to maintain compatibility with their input/output structure?
Can we feasibly create a sufficiently usable wrapper to AllenNLP models, paying a one-time engineering cost that gives us the benefit of the models? Or are there too many unknowns/complexities involved?
For the AllenNLP metrics/modules, do we prefer to copy/adapt code as needed (downsides: manual, fragile), or continue importing the AllenNLP library (downsides: dependency management)