Closed dcfidalgo closed 4 years ago
Hi, Sorry for late response. I totally missed it.
For the question, I didn't do a the comparison. But usually when there is enough data, the more you include in the representation you can get better results. This is also the case in other models, for example, when you use BERT as features. There are many layers in BERT model, and when you use more layers you can get better results.
@handsomezebra Sorry to bother you via this channel, i thought it would be more adequate to use an issue here than emailing you directly.
We are starting to use your BiMPM implementation in AllenNLP. Compared to the paper (https://arxiv.org/pdf/1702.03814.pdf) you not only pass on the final context representation to the matching layer, but also the "intermediate context representation" (
encoder1
) and the word embeddings. What was your motivation? Did you do some comparison between these settings?Sorry again to bother you with this "old" stuff. Have a great day!