yoonkim / lstm-char-cnn

LSTM language model with CNN over characters
MIT License
826 stars 221 forks source link

shuffling the data performs worse #24

Closed SwordYork closed 7 years ago

SwordYork commented 7 years ago

Hi,

I have found that the model performs much worse when trained using the shuffled data (PTB). For example, the final PPL of the large word-level model is 97.79. Do you have any idea?

Thanks!

yoonkim commented 7 years ago

Sure. RNN-based language modeling on PTB usually treats the entire document as one long sentence. Hence, if the sentences are ordered (as is the case with Mikolov's version of the data), then it can use information from previous sentences in a meaningful way.

SwordYork commented 7 years ago

I see. Thank you very much!