keras-team / keras-preprocessing

Utilities for working with image data, text data, and sequence data.
Other
1.02k stars 444 forks source link

Off-by-one error in text preprocessing (sequence_to_matrix) #78

Open dtieber opened 5 years ago

dtieber commented 5 years ago

Seems like the num_words property in text.py is not initialized with the correct length. I found this out because I'm using this value in order to calculate the number of input/output neurons which leads to issues when I'm training the model.

I think num_words should be initialized like this: num_words = len(self.word_index) if not set explicitly.

Dref360 commented 5 years ago

To avoid breaking changes, we should document this behaviour.

dtieber commented 5 years ago

Did you see the PR by any chance?

Dref360 commented 5 years ago

Yup I saw it. We will plan a major release with all off-by-one fixes. Right now, this would cause a lot of projects to break.