vocab.txt size doesn't match CNN dataset vocabulary size.

thomasmesnard / DeepMind-Teaching-Machines-to-Read-and-Comprehend

Implementation of "Teaching Machines to Read and Comprehend" proposed by Google DeepMind

MIT License

408 stars 104 forks source link

Closed webeng closed 8 years ago

webeng commented 8 years ago

Firstly, thank you for open sourcing this project.

How did you get the vocab.txt file? There are 29,406 words in the file.

However, I counted all the unique words in the CNN dataset and there are 119,567 unique words.

webeng commented 8 years ago

According to the original authors the original size should be ~120K.