Open mhabedank opened 1 month ago
if you can provide an example i can help with the rest
If we decide to replace the dependency, this would be about 5 lines of code: https://pytorch.org/text/stable/_modules/torchtext/data/utils.html#ngrams_iterator
torchtext is used here:
can we just copy the code over?
yeah that would probably be the solution for this tokenizer.
The NgramTokenizer is using torchtext. We want to remove torchtext as a dependency so this Tokenizer has to be refactored not using it.