macournoyer / neuralconvo

Neural conversational model in Torch
776 stars 347 forks source link

The wrong number of TOTAL_LINES in cornell_movie_dialogs.lua? #24

Closed jamesweb1 closed 8 years ago

jamesweb1 commented 8 years ago

I download the dataset from the website, and I find there is only 304713 lines in the movie_lines.txt. But in the cornell_movie_dialogs.lua, it says TOTAL_LINES is 387810. What is the difference between them? I download the wrong version of dataset or something change?

macournoyer commented 8 years ago

IIRC that's because some lines appear in several conversations (movie_conversations.txt)

That number is only used for displaying progress anyways.

jamesweb1 commented 8 years ago

Yes, I know it's for displaying progress only, but I'm afraid of the integrity of my dataset. I find the 387810 is represented by the lines of movie_lines.txt(#304713) and movie_conversations.txt(#83097) because the program will read the movie_lines and then movie_conversations. Thanks anyway. 👍