Where is the init_embd_file './tools/numberbatch-en-19.08.txt' ? And how is the file keyword.vocab generated?

li3cmz / GRADE

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

57 stars 9 forks source link

Where is the init_embd_file './tools/numberbatch-en-19.08.txt' ? And how is the file keyword.vocab generated? #2

Closed LiuChang97 closed 3 years ago

LiuChang97 commented 3 years ago

Hi, I have some trouble running ./script/inference.sh . (using dailydialog dataset)

When running main_for_metric_grade.py, there is no such file: ./data/DailyDialog/keyword.vocab. How the file is generated?
When running main_for_metric_grade.py, there is no init_embd_file: ./tools/numberbatch-en-19.08.txt

li3cmz commented 3 years ago

Sorry for our misleading README, and we have updated it now. About your questions:

For now, the "keyword.vocab" file is in the processed training data and you need to download the provided data (or generate it from scratch) for inference. Thank you for pointing it out and we will fix it later.
"numberbatch-en-19.08.txt" is in the tools we provided, please download and unzip it before using GRADE.

LiuChang97 commented 3 years ago

Thanks a lot! That's very helpful.

I have one more question. If I want to adopt GRADE to other dialog datasets such as ConvAI2, I wonder if I only need to generate 4 new files based on the new dataset (1)original_dialog_merge.keyword, (2)original_dialog_merge.ctx_keyword, (3)original_dialog_merge.rep_keyword,(4)test_text.pkl. While the following files you provided in ./data/DailyDialog can be reused? (1)the provided GRADE checkpoint, (2)keyword.vocab(3)dialog_keyword_tuples_multiGraph.hop, (4)1st_hop_nr10.embedding, (5)2nd_hop_nr10.embedding?

li3cmz commented 3 years ago

Thanks a lot! That's very helpful.

I have one more question. If I want to adopt GRADE to other dialog datasets such as ConvAI2, I wonder if I only need to generate 4 new files based on the new dataset (1)original_dialog_merge.keyword, (2)original_dialog_merge.ctx_keyword, (3)original_dialog_merge.rep_keyword,(4)test_text.pkl. While the following files you provided in ./data/DailyDialog can be reused? (1)the provided GRADE checkpoint, (2)keyword.vocab(3)dialog_keyword_tuples_multiGraph.hop, (4)1st_hop_nr10.embedding, (5)2nd_hop_nr10.embedding?

These 4 files are generated by running the "inference.sh" and you don't need to add extra code to generate them. What you need to do are to update the "load_dataset" function in "./preprocess/extract_keywords.py" and provide your own dataset in the specific format as described in the README.