nusnlp / mlconvgec2018

Code and model files for the paper: "A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction" (AAAI-18).
GNU General Public License v3.0
185 stars 73 forks source link

Accuracy of trained model? #22

Closed theincluder closed 5 years ago

theincluder commented 5 years ago

I've trained the mlconv model using train_embed.sh, with hyperparameters in the script. The training ended without error in 5 epochs.

But I cant reproduce the F0.5 score in the paper. My model achieved F0.5 score of 0.18. (far from reported F0.5 of 0.45) The result (output.tok.txt) was also terrible.

Has anyone suffered this problem?

Here's my training log

`

shamilcm commented 5 years ago

What training data did you use and how did you process your training data?

On Thu, May 9, 2019 at 8:32 PM theincluder notifications@github.com<mailto:notifications@github.com> wrote:

I've trained the mlconv model using train_embed.sh, with hyperparameters in the script. The training ended without error in 5 epochs.

But I cant reproduce the F0.5 score in the paper. My model achieved F0.5 score of 0.18. (far from reported F0.5 of 0.45) The result (output.tok.txt) was also terrible.

Has anyone suffered this problem?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/nusnlp/mlconvgec2018/issues/22, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAE46MGF7WX6WUWQBGUEFPTPUQKV7ANCNFSM4HL2B2GA.

DISCLAIMER The sender of this email is an alumnus of National University of Singapore (NUS). Kindly note that NUS is not responsible for the contents of this email, and views and opinions expressed are solely the sender's.

theincluder commented 5 years ago

I downloaded two training datasets (nucle and lang8v2) and ran prepare_data.sh and preprocess.sh They ran without error.

shamilcm commented 5 years ago

Can you share your train log?

theincluder commented 5 years ago

Can you share your train log?

I have one in the priginal post. I"ll upload another after another training.

shamilcm commented 5 years ago

Sorry, I missed the log file.

You seem to be using a newer version of Pytorch than what we used for this project. We used an old fork of Fairseq (https://github.com/shamilcm/fairseq-py) which required Pytorch 0.2.0 compiled from source.

If you want to use a later version of Fairseq (v 0.5), you can use the scripts in this fairseq0.5 branch of our repository (https://github.com/nusnlp/mlconvgec2018/tree/fairseq0.5). This has been tested to work with Pytorch 0.4.1 (no need for compilation from source, can be installed via conda)

theincluder commented 5 years ago

Sorry, I missed the log file.

You seem to be using a newer version of Pytorch than what we used for this project. We used an old fork of Fairseq (https://github.com/shamilcm/fairseq-py) which required Pytorch 0.2.0 compiled from source.

If you want to use a later version of Fairseq (v 0.5), you can use the scripts in this fairseq0.5 branch of our repository (https://github.com/nusnlp/mlconvgec2018/tree/fairseq0.5). This has been tested to work with Pytorch 0.4.1 (no need for compilation from source, can be installed via conda)

Thank you for your help! I have had trouble running the original branch. (I had to test multiple pytorch and fairseq versions and modify some codes)

I'll test the new version and post the result.

theincluder commented 5 years ago

Hahaha I found the problem and it was a trivial mistake.

I should have run training/run_trained_model.sh , but I ran run.sh instead.

Sorry for bugging you for my mistake.

(anyway, the fairseq0.5 branch worked well)

NikhilCherian commented 4 years ago

@theincluder @shamilcm @gurunath-p

i am having trouble getting the m2 score. i ran the ./run_trained_model.sh

Got the output.bpe.nbest.txt, output.bpe.txt and output.tok.txt.

But i could not get the m2 score.

Note: i did not train the reranker. I did it without it. Can you tell me what i am missing? Any help would be appreciated.

shamilcm commented 4 years ago

If you have decoded the CoNLL-2014 test set, you need to get the reference M2 file from https://www.comp.nus.edu.sg/~nlp/conll14st.html. Download the annotated test data. The reference M2 file for the competition is the official-2014.combined.m2 file in the no-alt/ directory. Download the official M2 scorer from the same page. Run m2 scorer using ./m2scorer output.tok.txt /path/to/official-2014.combined.m2

NikhilCherian commented 4 years ago

@shamilcm thanks for the link. i ran the m2 scorer, but the problem was difference in output.tok.txt and conll14-test.m2.

image So, i followed this another issue https://github.com/nusnlp/mlconvgec2018/issues/2

so i followed as u told in the issue by :

  1. Using interactive.py instead of generate.py with a --interactive.
  2. I tried to preprocess again with --testpref as the conll14-test.tok.src.Now the error is that i dont have conll14-test.tok.trgt target file. image

I would be so thankful if you could help me here. Thanks in advance

NikhilCherian commented 4 years ago

@shamilcm
I have another doubt regarding the accuracy of model. I got the results one model. image

Can you describe more about some other wiki corpora described in the paper to bolster the F0.5 score? Could you also share how you created the ensemble with different initializations? I would like to know that too.

Thanks a lot in advance.

shamilcm commented 4 years ago

In the paper, Wikipedia corpus was used to train fast text embeddings to initialize the word embeddings of the model before training. Ensembling was done by training 4 separate models with different random seeds. All four models were simultaneously used during deciding. Fairseq generate can take in multiple models As arguments for ensemble decoding

On Thu, 18 Jun 2020 at 9:23 PM, NikhilCherian notifications@github.com wrote:

@shamilcm https://github.com/shamilcm I have another doubt regarding the accuracy of model. I got the results one model. [image: image] https://user-images.githubusercontent.com/11813426/85042979-260ac800-b18c-11ea-9ad8-5e160ce4e701.png

Can you describe more about some other wiki corpora described in the paper to bolster the F0.5 score? Could you also share how you created the ensemble with different initializations? I would like to know that too.

Thanks a lot in advance.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nusnlp/mlconvgec2018/issues/22#issuecomment-646117607, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAE46MDCDX6FJUOL2QBCKMDRXIZ63ANCNFSM4HL2B2GA .