microsoft / onnxruntime-extensions

onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
MIT License
295 stars 80 forks source link

Improve handling of missing `vocab_file` attribute in HFTokenizerConverter #677

Closed kazssym closed 3 months ago

kazssym commented 3 months ago

This commit updates HFTokenizerConverter to handle cases where the hf_tokenizer object might not have a vocab_file attribute.

Changes:

This ensures the converter works correctly even with tokenizers that don't define a vocab_file attribute.

kazssym commented 3 months ago

I'm not sure but it looks like GPT2Tokenizer/-Fast lacks the vocab_file attribute.

sayanshaw24 commented 3 months ago

/azp run onnxruntime-extensions.CI

azure-pipelines[bot] commented 3 months ago
Azure Pipelines successfully started running 1 pipeline(s).