Error on training when used preprocess_shards

harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention

http://nlp.seas.harvard.edu/code

MIT License

1.26k stars 278 forks source link

Error on training when used preprocess_shards #69

Closed ghost closed 8 years ago

ghost commented 8 years ago

Due to large training dataset I had to use the preprocess_shards in order to split it. When running the train.lua i get the following error: loading data... /home/sergio/torch/install/bin/luajit: /home/sergio/torch/install/share/lua/5.1/hdf5/group.lua:312: HDF5Group:read() - no such child 'num_source_features' for [HDF5Group 33554432 /] Seems like 'num_source_features' is used in preprocess but not in shards. Could you please advice? Thanks

guillaumekln commented 8 years ago

Unfortunately, preprocesor-shards.py still lags behind in terms of features due to heavy code duplication with preprocess.lua. In the mean time, you can use the updated implementation from @mdasadul:

https://github.com/mdasadul/seq2seq-attn/blob/bcd899ec990da6b2c5c616aab5ac77b5c7760dc6/preprocess-shards.py

See #49.

ghost commented 8 years ago

Thanks!