harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

preprocess fails on sample data #19

Closed gaphex closed 8 years ago

gaphex commented 8 years ago

python preprocess.py --srcfile data/src-train.txt --targetfile data/targ-train.txt fails with the following output: Number of sentences in training: 10000 Traceback (most recent call last): File "preprocess.py", line 343, in sys.exit(main(sys.argv[1:])) File "preprocess.py", line 340, in main get_data(args) File "preprocess.py", line 257, in get_data args.seqlength, max_word_l, args.chars) File "preprocess.py", line 79, in make_vocab enumerate(itertools.izip(open(srcfile,'r'), open(targetfile,'r'))): TypeError: coercing to Unicode: need string or buffer, NoneType found

yoonkim commented 8 years ago

hi, it seems like you are not specificing the srcvalfile and targetvalfiles?

srush commented 8 years ago

Yoon, let's have this fail in a nicer way. You should be able to add required=True to those args.

yoonkim commented 8 years ago

yep, made those changes--closing now