keras-team / keras-preprocessing

Utilities for working with image data, text data, and sequence data.
Other
1.02k stars 444 forks source link

Tokenizer.texts_to_matrix failed when argument num_words of Tokenizer's constructor is initialized with float type (instead of integer) #268

Open etiennekintzler opened 4 years ago

etiennekintzler commented 4 years ago

When Tokenizer is initialized with a float instead of an integer for the parameter num_words, no error is raised. However when the method Tokenizer.texts_to_matrix is called, it results in TypeError at line 413 of text.py (x = np.zeros((len(sequences), num_words))) since numpy array shape cannot be specify with float.

Would not it be preferable that this error is raised in the instanciation part or that a float to integer cast is made when possible ?

Code snippet :

texts = ["hello", "world"]
tokenizer = Tokenizer(num_words=1e2)
tokenizer.fit_on_texts(texts=texts)
tokenizer.texts_to_matrix(texts=texts)