huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
https://huggingface.co/docs/tokenizers
Apache License 2.0
8.67k stars 743 forks source link

`Encoding` object stub doesn't include `__len__` #1556

Open thearchitector opened 1 week ago

thearchitector commented 1 week ago

It looks like the Encoding object returned by Tokenizer.encode has the __len__ dunder method, and running len(encoding) works, but for some reason it's corresponding generated .pyi file does not. This causes type checking errors in Pyright/Pylance, even when running in non-strict mode (since this library isn't typed).

ArthurZucker commented 1 week ago

Ah I see. If you want to open a PR for a fix feel free to do so! Otherwise I'll do it when I have a bit of time!

thearchitector commented 1 week ago

I can make a PR if you can point in me in the right direction. Otherwise, I have no problem waiting.