neulab / knn-transformers

PyTorch + HuggingFace code for RetoMaton: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022), including an implementation of kNN-LM and kNN-MT
MIT License
269 stars 22 forks source link

Mismatch on KNN-MT result on README #9

Closed FYYFU closed 1 year ago

FYYFU commented 1 year ago

Hi. Thanks for your awesome project! For t5-small, I got the MT result on validation set. That is:

"eval_bleu": 26.1472, "eval_gen_len": 42.1916, "eval_loss": 1.4190454483032227, "eval_runtime": 216.1581, "eval_samples": 1999, "eval_samples_per_second": 9.248, "eval_steps_per_second": 2.313

However, for KNN-MT, i got a different result. That is:

"eval_bleu": 32.0026, "eval_gen_len": 42.1126, "eval_loss": 0.40791189670562744, "eval_runtime": 4053.3114, "eval_samples": 1999, "eval_samples_per_second": 0.493, "eval_steps_per_second": 0.123

and the speed is too slow that i wonder if there is some wrong in my shell? KNN-MT Shell is:

meta_path=path_to_project model_name=t5-small model_path=path_to_all_model/${model_name} python -u $meta_path/knn-transformers/run_translation.py \ --model_name_or_path ${model_path} \ --dataset_name wmt16 --dataset_config_name ro-en \ --per_device_eval_batch_size=4 \ --output_dir $meta_path/checkpoints-translation/$model_name-datastore \ --source_lang en --target_lang ro \ --do_eval \ --predict_with_generate \ --source_prefix "translate English to Romanian: " \ --dstore_dir $meta_path/checkpoints-translation/$model_name-datastore \ --knn_temp 50 --k 32 --lmbda 0.25 \ --knn

original MT Shell is:

meta_path=path_to_project model_name=t5-small model_path=path_to_all_model/${model_name} python -u ${meta_path}/knn-transformers/run_translation.py \ --model_name_or_path ${model_path} \ --dataset_name wmt16 --dataset_config_name ro-en \ --per_device_eval_batch_size=4 \ --output_dir $meta_path/checkpoints-translation/$model_name \ --source_lang en --target_lang ro \ --do_eval \ --predict_with_generate \ --source_prefix "translate English to Romanian: "

I notice that if i delete the predict_with_generate in KNN-MT shell, the speed will be the same as the original MT and the eval_loss is also the same as original MT. But i can not get the eval_bleu. Like:

eval_loss = 0.4079 eval_runtime = 0:02:16.85 eval_samples = 1999 eval_samples_per_second = 14.606 eval_steps_per_second = 3.653.

However, set predict_with_generate will not affect the speed of original MT. Could you please give some instruction to solve this problem?

Thanks!

urialon commented 1 year ago

Hi @FYYFU , Thank you for your interest in our work! I see that you closed the issue, but I will answer in case it still helps.

As far as I understand from the Huggingface code, predict_with_generate is crucial when generating outputs for machine translation.

Otherwise, the model just predicts a single token at a time, and uses a kind of "teacher forcing" at test time. This is useful only to measure perplexity, but not for evaluating long outputs using metrics such as BLEU.

Let me know if anything is still unclear! Uri

FYYFU commented 1 year ago

Hi @FYYFU , Thank you for your interest in our work! I see that you closed the issue, but I will answer in case it still helps.

As far as I understand from the Huggingface code, predict_with_generate is crucial when generating outputs for machine translation.

Otherwise, the model just predicts a single token at a time, and uses a kind of "teacher forcing" at test time. This is useful only to measure perplexity, but not for evaluating long outputs using metrics such as BLEU.

Let me know if anything is still unclear! Uri

Thanks for your reply! I'm sorry for my fault during setting the eval shell. Since i did not set overwrite_cache in the KNN_MT eval shell, it use the cache instead of regenerating the dataset. Thus those keys saved in datastore are from the validation dataset, which finally lead to a higher BLEU and a lower speed on validation dataset.

Again, Thanks for your awesome work! :)