Evaluating custom dataset with dsEM metric

salesforce / decaNLP

The Natural Language Decathlon: A Multitask Challenge for NLP

BSD 3-Clause "New" or "Revised" License

2.34k stars 474 forks source link

Evaluating custom dataset with dsEM metric #48

Closed ashleyyy94 closed 5 years ago

ashleyyy94 commented 5 years ago

Hi is there a way to evaluate datasets using the dsEM metric? Currently additional commands only support --bleu and --rouge.

Thank you.

keskarnitish commented 5 years ago

In conjunction with https://github.com/salesforce/decaNLP/issues/47, I believe you can add your task (as an argument flag or directly) into:

https://github.com/salesforce/decaNLP/blob/1e9605f246b9e05199b28bde2a2093bc49feeeaa/validate.py#L78

Closing for now, feel free to reopen.

ashleyyy94 commented 5 years ago

Update: Adding a custom task as an argument flag does not work. I changed it to "dialogue='woz' in task or 'custom' in task". However, since my dataset is in the {Context, Question, Answer} format, computeDialogue in metrics.py will process the answer wrongly and give an incorrect answer, usually comprising of a single letter.