Closed rth closed 5 years ago
Adds a Character tokenizer,
tokenizer = CharacterTokenizer(window_size=4) assert tokenizer.tokenize("fox can't") == [ "fox ", "ox c", "x ca", " can", "can'", "an't"]
Adds a Character tokenizer,