Lower performance in alignment compared to another preprocessing script.

Hi Sanxing, thank you for sharing this script!

I run your preprocess.py (clean empty lines; I did not run the whole prepare.sh) and then use fast_align to learn an alignment model on the parallel corpus. I found that the perplexity of alignmens given by the alignment model is higher than the results of the parallel corpus preprocessed by another script wmt.py. I guess this is due to that they merge the blank lines. So could you possibly add this merge blank lines function into your script in the future? Thanks a lot!

sanxing-chen / NMT2017-ZH-EN

Lower performance in alignment compared to another preprocessing script. #5