neulab / compare-mt

A tool for holistic analysis of language generations systems

BSD 3-Clause "New" or "Revised" License

465 stars 58 forks source link

Added ability to bucket sentences by external label #94

Closed zdou0830 closed 5 years ago

zdou0830 commented 5 years ago

89 .

Sample usage: compare-mt example/ted.ref.eng example/ted.sys1.eng example/ted.sys2.eng --compare_sentence_buckets 'bucket_type=label,out_labels=example/ted.sys1.eng.senttag;example/ted.sys2.eng.senttag,label_set=0+10+20+30+40+50+60+70+80+90+100,statistic_type=score,score_measure=bleu'

compare-mt example/ted.ref.eng example/ted.sys1.eng example/ted.sys2.eng --compare_sentence_buckets 'bucket_type=numlabel,out_labels=example/ted.sys1.eng.senttag;example/ted.sys2.eng.senttag,bucket_cutoffs=0:10:20:30:40:50:60:70:80:90:100,statistic_type=score,score_measure=bleu'

neubig commented 5 years ago

Thanks! Could you add the example here to the README as well?

zdou0830 commented 5 years ago

Thank you! Here I just label each sentence with its length. Is this suitable to be put in README or is there any more informative labelling method?

neubig commented 5 years ago

Yeah, that's fine for now. Other things we could think of would be something like per-word language model probability (for numlabel) or whether the reference sentence contains a question word, which would tell you about how well something can do on questions (for label).

zdou0830 commented 5 years ago

Thank you! I've updated the README!

neubig commented 5 years ago

Thanks a lot!

Also one github protip: If you write "Fixes #89" (instead of just "#89"), then the issue will be automatically closed when the PR is merged, which can be convenient.