nyu-dl / dl4mt-tutorial

BSD 3-Clause "New" or "Revised" License
618 stars 249 forks source link

How to build the dataset 'all.en.concat.gz.pkl' in session2/train_nmt_all.py? #70

Closed wead-hsu closed 8 years ago

wead-hsu commented 8 years ago

Sorry, it may not be an issue. However, can you provide any idea about how to make the dataset?

orhanf commented 8 years ago

Hi @wead-hsu , please use the provided preprocessing script here, which will download and preprocess the necessary datasets for a sample run. Note that files with *.pkl extension are vocabulary files.