twairball / t2t_wmt_zhen

NMT for chinese-english using tensor2tensor
MIT License
47 stars 12 forks source link

What do the ratios mean? #9

Open nyck33 opened 5 years ago

nyck33 commented 5 years ago

Remove sentences with source/target word ratio > 9 Remove sentences with source/target word ratio < 0.1111

I am not sure what this means. I am looking at your repo for ideas on a Japanese to English translation task for T2T. Thanks.

twairball commented 5 years ago

let s = number of words in source sentence, x let t = number of words in target sentence, y

remove sentence pairs (x, y) where s/t > 9 or s/t < 1/9

This is just a heuristic way of filtering out bad training sentences