microsoft / onnxruntime-extensions

onnxruntime-extensions: A specialized pre- and post- processing library for ONNX Runtime
MIT License
295 stars 80 forks source link

ValueError: Unsupported processor/tokenizer: Qwen2Tokenizer #724

Open Wonder1905 opened 1 month ago

Wonder1905 commented 1 month ago

Hi, Im trying to export my tokenizer, and followed this short guide: Guide Now, using: tokenizer = AutoTokenizer.from_pretrained(onnx_path, use_fast=False) onnx_tokenizer = OrtPyFunction(gen_processing_models(tokenizer, pre_kwargs={})[0])

But getting:

    onnx_tokenizer = OrtPyFunction(gen_processing_models(tokenizer, pre_kwargs={})[0])
  File "/opt/conda/envs/qwen_env/lib/python3.9/site-packages/onnxruntime_extensions/cvt.py", line 96, in gen_processing_models
    raise ValueError(f"Unsupported processor/tokenizer: {cls_name}")
ValueError: Unsupported processor/tokenizer: Qwen2Tokenizer

What are my options from this point? Thanks!

wenbingl commented 1 month ago

It's not in list here, https://github.com/microsoft/onnxruntime-extensions/blob/ca433cbea706e7c1782df25391f877e28b887d61/onnxruntime_extensions/_hf_cvt.py#L183. So it haven't been supported yet.

But in huggingface code repo, it looks like Qwen2Tokenizer is very similar to existing GPT2Tokenizer, you can add an item in the list if you are urgent, Or waiting for our PR later.