Closed kalyangvs closed 6 years ago
@gvskalyan let me understand more about the problem. What's the accuracy you can get without Elmo? And then after you include Elmo, how slow it is (minutes per epoch?), and what's the accuracy you can get (from the first several epochs)?
for the SNLI task
when used elmoembedder command and wrote static embeddings to files and Scalar mix done by us it took around 40 min per epoch and all the above results are regarding this case.
but using the Elmo class and computing the representations dynamically in the model costs 3 hr 18 min per epoch in the earlier epochs.
As mentioned in the paper this curve is attained till the 80 percent
Please suggest the necessary corrections..
@gvskalyan I am not an expert on the Elmo part but I think the slowness of training are expected. My training on Quora with Elmo embedding also uses more than 2hr each epoch.
There could be many possible reasons why your experiments cannot get good results. One of them is to better handle the paddings in the algorithm. In LSTM those paddings not only introduce unnecessary calculation cost but also brings noises to the calculation results. In BiMPM matchings, the last token should also exclude the padded tokens, and the max/average calculation should be adjusted accordingly.
Other than padding all I can think of is to try different dropout, the Elmo embeddings introduce more complexity in the model so more dropout might be needed.
My suggestion is to directly use my code to do the experiments. My code include many small improvements to the basic algorithm that galsang's version didn't have. The configuration file nlp/experiments/quora_bimpm_elmo.json is the one I used for Quora with Elmo (where I got about 87% accuracy) and nlp/experiments/snli_bimpm_word_char.json is the config for SNLI but without Elmo.
thanks a lott
usage: allennlp train [-h] -s SERIALIZATION_DIR [-r] [-o OVERRIDES] [--file-friendly-logging] [--include-package INCLUDE_PACKAGE] param_path allennlp train: error: the following arguments are required: param_path
can u please specify the command to train the model is the param_path the .json config file or else please specify it.
THE command for training Quora dataset with the ELMo embeddings
First make sure you installed the allennlp from their master branch. Then clone my repository and enter into it. Then run the following:
allennlp train experiments/quora_bimpm_elmo.json -s <SERIALIZATION_DIR> --include-package hznlp
can you please share the trained weights of the Quora paraphrase with ELMo if you used ELMo class did you soft tune the model
Sorry I didn't keep the trained model of Quora BiMPM with ELMo since it's not among the best performing ones. If you can tune the model and get better accuracy please also let me know. Thanks!
please find the attached files train_elmo.txt train.txt BIMPM.txt