Open paw-lu opened 4 years ago
As outlined in #301, this PR makes keras.preprocessing.text.Tokenizer remove the characters in the filters argument if char_level=True.
keras.preprocessing.text.Tokenizer
filters
char_level=True.
Closes #301.
❯ tokenizer = keras.preprocessing.text.Tokenizer(char_level=True, filters="e") ❯ tokenizer.fit_on_texts("ae") ❯ tokenizer.word_index {'a': 1, 'e': 2} # "e" is tokenized
❯ tokenizer = keras.preprocessing.text.Tokenizer(char_level=True, filters="e") ❯ tokenizer.fit_on_texts("ae") ❯ tokenizer.word_index {'a': 1} # "e" is not tokenized
Closes #301
Summary
As outlined in #301, this PR makes
keras.preprocessing.text.Tokenizer
remove the characters in thefilters
argument ifchar_level=True.
Closes #301.
Behavior before
Behavior after
Closes #301
Related Issues
PR Overview