hzeng-otterai / nlp

Deep NLP algorithms based on PyTorch and AllenNLP
MIT License
3 stars 4 forks source link

BIMPM ELMo implementation for quora #1

Closed kalyangvs closed 6 years ago

kalyangvs commented 6 years ago

please find the attached files train_elmo.txt train.txt BIMPM.txt

hzeng-otterai commented 6 years ago

@gvskalyan let me understand more about the problem. What's the accuracy you can get without Elmo? And then after you include Elmo, how slow it is (minutes per epoch?), and what's the accuracy you can get (from the first several epochs)?

kalyangvs commented 6 years ago

image for the SNLI task

when used elmoembedder command and wrote static embeddings to files and Scalar mix done by us it took around 40 min per epoch and all the above results are regarding this case.

but using the Elmo class and computing the representations dynamically in the model costs 3 hr 18 min per epoch in the earlier epochs.

As mentioned in the paper this curve is attained till the 80 percent image

Please suggest the necessary corrections..

hzeng-otterai commented 6 years ago

@gvskalyan I am not an expert on the Elmo part but I think the slowness of training are expected. My training on Quora with Elmo embedding also uses more than 2hr each epoch.

There could be many possible reasons why your experiments cannot get good results. One of them is to better handle the paddings in the algorithm. In LSTM those paddings not only introduce unnecessary calculation cost but also brings noises to the calculation results. In BiMPM matchings, the last token should also exclude the padded tokens, and the max/average calculation should be adjusted accordingly.

Other than padding all I can think of is to try different dropout, the Elmo embeddings introduce more complexity in the model so more dropout might be needed.

My suggestion is to directly use my code to do the experiments. My code include many small improvements to the basic algorithm that galsang's version didn't have. The configuration file nlp/experiments/quora_bimpm_elmo.json is the one I used for Quora with Elmo (where I got about 87% accuracy) and nlp/experiments/snli_bimpm_word_char.json is the config for SNLI but without Elmo.

kalyangvs commented 6 years ago

thanks a lott

kalyangvs commented 6 years ago

usage: allennlp train [-h] -s SERIALIZATION_DIR [-r] [-o OVERRIDES] [--file-friendly-logging] [--include-package INCLUDE_PACKAGE] param_path allennlp train: error: the following arguments are required: param_path

can u please specify the command to train the model is the param_path the .json config file or else please specify it.

THE command for training Quora dataset with the ELMo embeddings

hzeng-otterai commented 6 years ago

First make sure you installed the allennlp from their master branch. Then clone my repository and enter into it. Then run the following:

allennlp train experiments/quora_bimpm_elmo.json -s <SERIALIZATION_DIR> --include-package hznlp
kalyangvs commented 6 years ago

can you please share the trained weights of the Quora paraphrase with ELMo if you used ELMo class did you soft tune the model

hzeng-otterai commented 6 years ago

Sorry I didn't keep the trained model of Quora BiMPM with ELMo since it's not among the best performing ones. If you can tune the model and get better accuracy please also let me know. Thanks!