Closed RamvigneshPasupathy closed 2 months ago
Would you like to open a PR to add this featuyr? 🤗
Hi @ArthurZucker
I was going through the code once more with a view of contributing the method that I asked Tokenizer.from_bytes()
; but then I figured out that the feature that I am expecting is already available in a different method name Tokenizer.from_buffer()
.
Tried a PoC of loading the tokenizer from file bytes of a tokenizer.json, and it works. Attaching screenshots; Plz close this issue if you find the PoC is good and this code will be an enough reference for anyone who is using huggingface tokenizers..
Page 1 | Page 2 |
yeah maybe update the doc to make from buffer more findable?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Looking for a "Tokenizer.from_bytes()" support in python, similar to the one in Rust - https://github.com/huggingface/tokenizers/issues/1013
Currently, it is not available in the python bindings code - https://github.com/huggingface/tokenizers/blob/v0.19.1/bindings/python/src/tokenizer.rs
Why this is needed?