nusnlp / mlconvgec2018

Code and model files for the paper: "A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction" (AAAI-18).
GNU General Public License v3.0
185 stars 73 forks source link

About testing models #2

Closed renhongkai closed 6 years ago

renhongkai commented 6 years ago

I use trained model testing conll2014-test, output four files are input.bpe.txt,output.bpe.nbest.txt,output.bpe.txt,output.tok.txt,Which file should I use to evaluate?What is the script?Thank you very much.

shamilcm commented 6 years ago

Use the output.tok.txt file. We use the M2scorer, which is the standard scorer used for evaluating the CoNLL 2014 shared task systems. Note that the evaluation on some sentences can take long time with the standard scorer.

renhongkai commented 6 years ago

Thank you very much. And I encountered a new problem : the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file 。the number of sentences in the output.tok.txt file is 5458, it is the same as validation set. Can you help me ? I would be obliged if you could reply me at your earlist convenience.Thanks a lot in advance for your time and attention.

renhongkai commented 6 years ago

I used command : ./run.sh ./data/test/conll14st-test/conll14st-test.tok.src ./data/test/conll14st-test/output 0 ./training/models/mlconv/model1000

$SCRIPTS_DIR/apply_bpe.py -c $TRAINING_DIR/models/bpe_model/train.bpe.model < $input_file > $output_dir/input.bpe.txt

running fairseq on the test data --workers $threads $MODEL_DIR/data_bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt

CUDA_VISIBLE_DEVICES=$device python3.5 $FAIRSEQPY/generate.py --no-progress-bar --path $models --beam $beam --nbest $beam --workers $threads $TRAINING_DIR/processed/bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt --skip-invalid-size-inputs-valid-test

shamilcm commented 6 years ago

The flag --interactive is necessary while running fairseq on a custom input test set.

CUDA_VISIBLE_DEVICES=$device python3.5 $FAIRSEQPY/generate.py --no-progress-bar --path $models --beam $beam --nbest $beam --interactive --workers $threads $MODEL_DIR/data_bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt

renhongkai commented 6 years ago

Thanks a lot in advance for your time and attention. I summed up the questions I encountered. I think this is a problem of version. First, I use the sofeware directory download.sh file download fairseq-py (github: https://github.com/shamilcm/fairseq-py), but when I run the command “python setup.py build”,there are a error : cffi.error.VerificationError: CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1. so I change the fairseq-py version (github : https://github.com/facebookresearch/fairseq-py.git),this error did not appear. But then I found a problem: the parameter does not correspond. When I run the command "./run.sh ./data/test/conll14st-test/conll14st-test.tok.src ./data/test/conll14st-test/output 0 ./training/models/mlconv/model1000",there are two error : generate.py: error: unrecognized arguments: --interactive ,so I remove the flag --interactive . Another error is:Exception: Sample #10 has size (src=1, dst=1) but max size is 1022. Skip this example with --skip-invalid-size-inputs-valid-test,so I add the flag --skip-invalid-size-inputs-valid-test. Then the order can be successfully implemented ,but the the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file. Can you help me ?Thank you very much.

shamilcm commented 6 years ago

Oh ok. The version of Fairseq-py in the download.sh script compiles only on previous version of PyTorch (PyTorch 0.2.0) that is compiled from source.

In the recent version of fairseq-py, the developers have replaced generate.py --interactive with a different script interactive.py

https://github.com/facebookresearch/fairseq-py/blob/master/interactive.py

renhongkai commented 6 years ago

1、So, you mean I can use pytorch (0.3.0) and remove the flag --interactive ?How should I solve the mistake of the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file ? 2、I tested with pre-trained models, I run the command "./run.sh ./data/test/conll14st-test/conll14st-test.m2 ./log/ 0 ./models/mlconv_embed/ eolm" ,there are a same error: the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file. Thank you very much.

shamilcm commented 6 years ago
  1. If you use the recent version of Fairseq-py (which uses PyTorch 0.3.0), you should use the script interactive.py (https://github.com/facebookresearch/fairseq-py/blob/master/interactive.py) instead of generate.py.

  2. If you run run.sh with the recent version of Fairseq-py and not the one mentioned in download.sh script, you may encounter this error. This is because generate.py does not have the --interactive flag anymore. I believe, it will use the test set within the processed/bin directory and not the one that is provided through standard input. In our training script, we pass the development data itself to the --testpref flag. See:

https://github.com/nusnlp/mlconvgec2018/blob/3f270bcdeac5044eaaf5c551136489b166df08d3/training/preprocess.sh#L41

Btw, where did you obtain the 5458 sentences development set from? Did you download and process the training data yourself ?

renhongkai commented 6 years ago

1、The data set is provided by the teacher,include Lang-8 and NUCLE(version 3.2), 5458 sentence pairs from NUCLE, is taken out to be used as the development data. The training data include 132M sentence pairs . 2、I will try use interactive.py instead of generate.py. 3、You mean I need to turn the test set(conll14st-test.tok.src file) into the --testpref by run the command python3.5 $FAIRSEQPY/preprocess.py --source-lang src --target-lang trg --trainpref processed/train --validpref processed/dev --testpref processed/dev --nwordssrc 30000 --nwordstgt 30000 --destdir processed/bin 4、Can you explain what the /training/processed/bin directory is for? image 5、If I use version of Fairseq-py (which uses PyTorch 0.2.0), Do I need to compile and install pytorch from source? Instead of installing via pip? And Do other parameters need to be changed?

shamilcm commented 6 years ago

Use interactive.py instead of generate.py to decode the test set if you are using the latest Fairseq-py version. I was saying that alternatively, you can use generate.py itself if you had used conll14st-test for --testpref while doing preprocessing. The reason, I believe, is that in the current Fairseq-py, generate.py automatically uses the test.src-trg.{src,trg}.{bin,idx} files within processed/bin directory to perform decoding. And interactive.py decodes any input file that is passed through standard input.

  1. The training/processed/bin directory contains the binarized and indexed versions of the training, development and test datasets for faster loading during training, validation and testing. Also, it contains the vocabulary files (dict.src.txt and dict.trg.txt).

  2. Yes, I had to compile Pytorch from source since the Fairseq-py version that I used required the ATen library which was only available on the github version of PyTorch and not in the official release back then.

NikhilCherian commented 4 years ago

@renhongkai @shamilcm

Hello again. I can also trying to test the models using run.sh. But, ran into the same problem. I want to get the m2 scores, which is not in run.sh. The output would be output.bpe.nbest.txt .How to get those scores with the trained models? I will follow the new fairseq implementation.

Any help is appreciated. Thanks

YoonJeongLulu commented 3 years ago

Thank you for the wonderful source code. I have a favor to ask of you. The only GPU I can use is... Colab GPU. Therefore, I couldn't do pretrain myself, so I wanted to use the pre-trained one.

https://tinyurl.com/yd6wvhgw/mlconvgec2018/models

Can I download test.src-trg.src.bin, test.src-trg.src.idx, etc. in addition to dict.src.txt, which is published in the link above?

I am referring to https://github.com/kanekomasahiro/bert-gec.