neulab / external-knowledge-codegen

Code and data for ACL20 paper "Incorporating External Knowledge through Pre-training for Natural Language to Code Generation"
Apache License 2.0
96 stars 15 forks source link

Reproducing results #3

Open carlos-gemmell opened 4 years ago

carlos-gemmell commented 4 years ago

Hi

I am trying to reproduce the numbers stated in the paper for appropriate comparisons to a paper I am writing. But when I run the following command I get a corpus BLEU score of 30.69.

. scripts/conala/test.sh ../external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin 2>&1
load model from [../external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin]
Decoding: 100%|██████| 500/500 [02:39<00:00,  3.13it/s]
{'corpus_bleu': 0.30694588794625494, 'oracle_corpus_bleu': 0.4181369862278688, 'avg_sent_bleu': 0.2376696401071103, 'oracle_avg_sent_bleu': 0.3983062032090926, 'exact_match': 0.028, 'oracle_exact_match': 0.084}

I am guessing the reranker is not used in the generation of the results.

To solve this I accessed the testing function directly to generate hyps and evaluate them with the same BLEU functions.

model_file='external_repos/external-knowledge-codegen/best_pretrained_models/finetune.mined.retapi.distsmpl.dr0.3.lr0.001.lr_de0.5.lr_da15.beam15.seed0.mined_100000.intent_count100k_topk1_temp5.bin'
reranker_file = 'external_repos/external-knowledge-codegen/best_pretrained_models/reranker.conala.vocab.src_freq3.code_freq3.mined_100000.intent_count100k_topk1_temp5.bin'
self.parser = StandaloneParser('default_parser',
                              model_file,
                              'conala_example_processor',
                              beam_size=15,
                              cuda=True,
                              reranker_path=reranker_file)

This gives me a similar corpus BLEU score of 30.078 and an average sentence BLEU score with NLTK with smooth_fn3 of 25.295.

What are the necessary commands in sequence to get the score from the paper?

frankxu2004 commented 3 years ago

Sorry for the late reply. The 30.69 result you get is correct as is shown in the paper (without reranking)

To perform reranking, follow this part https://github.com/neulab/external-knowledge-codegen#reranking

Thanks!