sheffieldnlp / deepQuest

Framework for neural-based Quality Estimation
https://sheffieldnlp.github.io/deepQuest/
BSD 3-Clause "New" or "Revised" License
42 stars 13 forks source link

bug for eval_word_qe #3

Closed cocaer closed 5 years ago

cocaer commented 6 years ago

In evaluation.py,eval_word_qe should not be passed to four parameters. problem

fredblain commented 6 years ago

Hi @cocaer, could you provide us with more information regarding this pb? what data and its format are you using, steps to reproduce, etc. ? so we could better understand the origins of the pb. Best,

cocaer commented 6 years ago

pb1: final_scores = eval_word_qe(ref, pred_list[0], ds.vocabulary['word_qe'], 'Word') be replaced with final_scores = eval_word_qe(ref, pred_list[0], ds.vocabulary['word_qe']) . Because 'eval_word_ge' is defined as def eval_word_qe(gt_list, pred_list, vocab):

pb2: how to apply thie system to wmt18 word level beacuase the new data has the new format, which introduces the 'tag' flag. @fredblain

cocaer commented 6 years ago

By the way, It seems that deepQuest put all train, dev, test data into a single Dataset_*.pkl. So If I want to try other test data. What should i do?And does deepQuest support multi-gpu?

Thanks a lot!

julia-ive commented 5 years ago

Hi @cocaer, thanks a lot for your feedback. I am fixing this WordQE issue in my commit as shown above. Testing on new data is implemented for the moment only for Sentence QE (see documentation: Tutorial-Scoring). The procedure is unfortunately not tested for WordQE yet (ongoing). For WMT2018 data, please filter tags (take only target tags, for example). And, no, we do not support multi-gpu for now. I am closing this issue. Feel free to contact me directly on my email for any other questions.