When I perform NER inference using WebGPU, the results vary across different users' computers, and the execution results of WebGPU differ from those of WASM. The only code changes between using WASM and WebGPU involve modifying the device setting from WebGPU to WASM.
For some models, there is no difference, such as with Xenova/bert-base-multilingual-cased-ner-hrl
Reproduction
First, I converted the model to ONNX format using the following method: On the v3 branch of transformers.js, I executed the command python -m script.convert --quantize --model_id Isotonic/distilbert_finetuned_ai4privacy_v2. Then, I used the following code for model loading and inference:
When performing inference on the text "Anuj Joshi - Founder (May 2020) Over 22+ experience in channel space building various Route To Markets for global giants like Amazon, IBM & Autodesk," no entities are extracted. However, if the device is changed to "wasm", entities can be extracted.
System Info
"@huggingface/transformers": "^3.0.0-alpha.5"
Environment/Platform
Description
When I perform NER inference using WebGPU, the results vary across different users' computers, and the execution results of WebGPU differ from those of WASM. The only code changes between using WASM and WebGPU involve modifying the device setting from WebGPU to WASM. For some models, there is no difference, such as with Xenova/bert-base-multilingual-cased-ner-hrl
Reproduction
First, I converted the model to ONNX format using the following method: On the v3 branch of transformers.js, I executed the command python -m script.convert --quantize --model_id Isotonic/distilbert_finetuned_ai4privacy_v2. Then, I used the following code for model loading and inference:
When performing inference on the text "Anuj Joshi - Founder (May 2020) Over 22+ experience in channel space building various Route To Markets for global giants like Amazon, IBM & Autodesk," no entities are extracted. However, if the device is changed to "wasm", entities can be extracted.