huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.46k stars 436 forks source link

Deberta onnx pipeline issue #968

Open fxmarty opened 1 year ago

fxmarty commented 1 year ago

posted by @@rcshubhadeep

Hi,

I am really lost in something which relates to exporting DeBERTa in ONNX. I have the following code -

tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base")
model = ORTModelForMaskedLM.from_pretrained("microsoft/deberta-base", export=True)
pipe = pipeline("fill-mask", model=model, tokenizer=tokenizer)
pipe("I am a [MASK] engineer. I have worked in JS, CPP, Java, J2ME, and Python. I know Oracle and MySQL")

Results into this error -

---------------------------------------------------------------------------
InvalidArgument                           Traceback (most recent call last)
Cell In[8], line 1
----> 1 pipe("I am a [MASK] engineer. I have worked in JS, CPP, Java, J2ME, and Python. I know Oracle and MySQL")

File ~/skill_extraction/skill_extraction/lib/python3.10/site-packages/transformers/pipelines/fill_mask.py:239, in FillMaskPipeline.__call__(self, inputs, *args, **kwargs)
    217 def __call__(self, inputs, *args, **kwargs):
    218     """
    219     Fill the masked token in the text(s) given as inputs.
    220 
   (...)
    237         - **token** (`str`) -- The predicted token (to replace the masked one).
    238     """
--> 239     outputs = super().__call__(inputs, **kwargs)
    240     if isinstance(inputs, list) and len(inputs) == 1:
    241         return outputs[0]

File ~/skill_extraction/skill_extraction/lib/python3.10/site-packages/transformers/pipelines/base.py:1109, in Pipeline.__call__(self, inputs, num_workers, batch_size, *args, **kwargs)
   1101     return next(
   1102         iter(
   1103             self.get_iterator(
   (...)
   1106         )
   1107     )
   1108 else:
-> 1109     return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)

File ~/skill_extraction/skill_extraction/lib/python3.10/site-packages/transformers/pipelines/base.py:1116, in Pipeline.run_single(self, inputs, preprocess_params, forward_params, postprocess_params)
   1114 def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):
   1115     model_inputs = self.preprocess(inputs, **preprocess_params)
-> 1116     model_outputs = self.forward(model_inputs, **forward_params)
   1117     outputs = self.postprocess(model_outputs, **postprocess_params)
   1118     return outputs

File ~/skill_extraction/skill_extraction/lib/python3.10/site-packages/transformers/pipelines/base.py:1015, in Pipeline.forward(self, model_inputs, **forward_params)
   1013     with inference_context():
   1014         model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
-> 1015         model_outputs = self._forward(model_inputs, **forward_params)
   1016         model_outputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))
   1017 else:

File ~/skill_extraction/skill_extraction/lib/python3.10/site-packages/transformers/pipelines/fill_mask.py:101, in FillMaskPipeline._forward(self, model_inputs)
    100 def _forward(self, model_inputs):
--> 101     model_outputs = self.model(**model_inputs)
    102     model_outputs["input_ids"] = model_inputs["input_ids"]
    103     return model_outputs

File ~/skill_extraction/skill_extraction/lib/python3.10/site-packages/optimum/modeling_base.py:85, in OptimizedModel.__call__(self, *args, **kwargs)
     84 def __call__(self, *args, **kwargs):
---> 85     return self.forward(*args, **kwargs)

File ~/skill_extraction/skill_extraction/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py:1363, in ORTModelForTokenClassification.forward(self, input_ids, attention_mask, token_type_ids, **kwargs)
   1360     onnx_inputs["token_type_ids"] = token_type_ids
   1362 # run inference
-> 1363 outputs = self.model.run(None, onnx_inputs)
   1364 logits = outputs[self.output_names["logits"]]
   1366 if use_torch:

File ~/skill_extraction/skill_extraction/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py:200, in Session.run(self, output_names, input_feed, run_options)
    198     output_names = [output.name for output in self._outputs_meta]
    199 try:
--> 200     return self._sess.run(output_names, input_feed, run_options)
    201 except C.EPFail as err:
    202     if self._enable_fallback:

InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid Feed Input Name:token_type_ids

Whatever I do I can't get rid of this error. I am a noob with HF. I have posted the issue in both forum and SO. But no replies. Here is my last hope. What am I doing wrong? I read the PR that adds the V2 support as well. I can't figure out what is wrong. Even when I use return_token_type_ids=False in my tokenizer call it does not solve the problem. My transformer version is 4.27.0 and my optimum version is also the latest.

Originally posted by @rcshubhadeep in https://github.com/huggingface/optimum/issues/555#issuecomment-1501110962

fxmarty commented 1 year ago

Hi @rcshubhadeep, this looks like a bug in the fill mask pipeline, will fix!

Apart from that, microsoft/deberta-base looks to be quite bad, compare with bert-base-uncased:

from transformers import AutoTokenizer, pipeline, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base")
model = AutoModelForMaskedLM.from_pretrained("microsoft/deberta-base")
pipe = pipeline("fill-mask", model=model, tokenizer=tokenizer)

res = pipe("I am a [MASK] engineer.")
print(res[0])
# prints {'score': 0.002210365841165185, 'token': 44452, 'token_str': ' Patreon', 'sequence': 'I am a Patreon engineer.'}
rcshubhadeep commented 1 year ago

Hello,

Thanks so much for this. I can confirm that this issue does not exist for BERT (and Distill, roberta etc.).

I can also confirm that this issue is there for at least ner pipeline as well (that was my original use case, but I was trying to debug from the first principles). Another thing is that I had fine tuned a DeBERTa-V3-large for a ner task and that is where I found the issue. Meaning the same issue do exists in all the version of DeBERTa (I guess).

Do you want me to have a deeper look and try to see what is going on here? if that is something then please let me know. Also I will need an initial guide of how to setup things.

rcshubhadeep commented 1 year ago

Hi, @fxmarty It is not a bug in Optimum. As mentioned by you it is a bug in the pipeline classes. In fact for the pipelines the def preprocess needs to be changed in a way to incorporate this option (for DeBERTa only, I believe, so the call will look like this inputs = self.tokenizer(sentence, ..., return_token_type_ids=False)). I see that (for an example for TokenClassification task) I can't send any extra params while creating pipeline as the _sanitize_parameters then raises an exception. So, what do you suggest should be the best option to handle this?

rcshubhadeep commented 1 year ago

here is a mention of this issue - https://github.com/huggingface/optimum/issues/207#issuecomment-1164636129

rcshubhadeep commented 1 year ago

The (extremely) non-elegant solution that I am using at the moment, looks like the following -

from transformers.pipelines import TokenClassificationPipeline, AggregationStrategy
from typing import Union, List, Optional, Tuple

class MyTokenClassificationPipeline(TokenClassificationPipeline):
    def _sanitize_parameters(
        self,
        ignore_labels=None,
        grouped_entities: Optional[bool] = None,
        ignore_subwords: Optional[bool] = None,
        aggregation_strategy: Optional[AggregationStrategy] = None,
        offset_mapping: Optional[List[Tuple[int, int]]] = None,
        stride: Optional[int] = None,
    ):
        preprocess_params, other, postprocess_params = super()._sanitize_parameters(
                                                                                ignore_labels,
                                                                                grouped_entities,
                                                                                ignore_subwords,
                                                                                aggregation_strategy,
                                                                                offset_mapping,
                                                                                stride
                                                                                )
        preprocess_params['tokenizer_params'] = {'return_token_type_ids': False}
        return preprocess_params, other, postprocess_params

p = MyTokenClassificationPipeline(model=model, framework='pt', task='ner', tokenizer=tokenizer)