Closed jifan-chen closed 8 years ago
I had to download some wiki dump, extract the text and tokenise it. It'd be great if someone could put the data files online.
xxxx-abstract.xml
.xml
format. I wrote https://gist.github.com/jli05/99741bd4ba6844acc627 and https://gist.github.com/jli05/5f18e6f29174e7f1d8a5 to extract the text.data/preprocess.sh
and data/tokenize_all.sh
for the usage of tokenizer.perl
.Thanks a lot for the reply, I shall have a try.
Hi, I think it is just a simple question.
I'm new to dl4mt, and I wonder how I can run the neural language model of session0, since I can't find the code to download the wiki data needed.
Thanks.