huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.31k stars 403 forks source link

Trouble running transformer converted ONNX model on GPU #250

Closed bajrachar closed 1 year ago

bajrachar commented 2 years ago

System Info

optimum==1.2.3
onnxruntime==1.11.1
onnxruntime-gpu==1.11.1
transformers==4.20.1
python version 3.9.0
CUDA 11.6

Who can help?

@philschmid @JingyaHuang

Information

Tasks

Reproduction

I am trying to convert a transformer BERT ner model for disease token classification to ONNX and load the model onto a GPU using following code

from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer, pipeline

model = ORTModelForTokenClassification.from_pretrained("models/output/NCBI-disease", from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained("models/output/NCBI-disease")

onnx_ner = pipeline("ner",model=model,tokenizer=tokenizer,device=0)

pred = onnx_ner("This patient showed symptoms of pulmonary edema during routine scan")

print(pred)

Code works fine on CPU (i.e. device set to -1) With device set to 0 I get following error -->

Traceback (most recent call last):
  File "/home/ravi/Projects/clintrials-clj/resources/onnx_converter.py", line 7, in <module>
    onnx_ner = pipeline("ner",model=model,tokenizer=tokenizer,device=0)
  File "/home/ravi/.local/lib/python3.9/site-packages/transformers/pipelines/__init__.py", line 684, in pipeline
    return pipeline_class(model=model, framework=framework, task=task, **kwargs)
  File "/home/ravi/.local/lib/python3.9/site-packages/transformers/pipelines/token_classification.py", line 102, in __init__
    super().__init__(*args, **kwargs)
  File "/home/ravi/.local/lib/python3.9/site-packages/transformers/pipelines/base.py", line 770, in __init__
    self.model = self.model.to(self.device)
AttributeError: 'ORTModelForTokenClassification' object has no attribute 'to'

Expected behavior

It should print the predicted disease tokens --

[{'entity': 'B-bio', 'score': 0.99951124, 'index': 6, 'word': 'pulmonary', 'start': 32, 'end': 41}, {'entity': 'I-bio', 'score': 0.99948704, 'index': 7, 'word': 'ed', 'start': 42, 'end': 44}, {'entity': 'I-bio', 'score': 0.99845636, 'index': 8, 'word': '##ema', 'start': 44, 'end': 47}]
philschmid commented 2 years ago

@bajrachar could you try it with installing optimum from main branch?

bajrachar commented 1 year ago

Thank you @philschmid -- building from main branch worked along with moving the model to GPU.