Closed jamesweb1 closed 8 years ago
IIRC that's because some lines appear in several conversations (movie_conversations.txt)
That number is only used for displaying progress anyways.
Yes, I know it's for displaying progress only, but I'm afraid of the integrity of my dataset. I find the 387810 is represented by the lines of movie_lines.txt(#304713) and movie_conversations.txt(#83097) because the program will read the movie_lines and then movie_conversations. Thanks anyway. 👍
I download the dataset from the website, and I find there is only 304713 lines in the movie_lines.txt. But in the cornell_movie_dialogs.lua, it says TOTAL_LINES is 387810. What is the difference between them? I download the wrong version of dataset or something change?