Open fxmarty opened 1 year ago
Hi @rcshubhadeep, this looks like a bug in the fill mask pipeline, will fix!
Apart from that, microsoft/deberta-base looks to be quite bad, compare with bert-base-uncased
:
from transformers import AutoTokenizer, pipeline, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-base")
model = AutoModelForMaskedLM.from_pretrained("microsoft/deberta-base")
pipe = pipeline("fill-mask", model=model, tokenizer=tokenizer)
res = pipe("I am a [MASK] engineer.")
print(res[0])
# prints {'score': 0.002210365841165185, 'token': 44452, 'token_str': ' Patreon', 'sequence': 'I am a Patreon engineer.'}
Hello,
Thanks so much for this. I can confirm that this issue does not exist for BERT (and Distill, roberta etc.).
I can also confirm that this issue is there for at least ner
pipeline as well (that was my original use case, but I was trying to debug from the first principles). Another thing is that I had fine tuned a DeBERTa-V3-large for a ner task and that is where I found the issue. Meaning the same issue do exists in all the version of DeBERTa (I guess).
Do you want me to have a deeper look and try to see what is going on here? if that is something then please let me know. Also I will need an initial guide of how to setup things.
Hi, @fxmarty It is not a bug in Optimum. As mentioned by you it is a bug in the pipeline classes. In fact for the pipelines the def preprocess
needs to be changed in a way to incorporate this option (for DeBERTa only, I believe, so the call will look like this inputs = self.tokenizer(sentence, ..., return_token_type_ids=False)
). I see that (for an example for TokenClassification task) I can't send any extra params while creating pipeline as the _sanitize_parameters
then raises an exception. So, what do you suggest should be the best option to handle this?
here is a mention of this issue - https://github.com/huggingface/optimum/issues/207#issuecomment-1164636129
The (extremely) non-elegant solution that I am using at the moment, looks like the following -
from transformers.pipelines import TokenClassificationPipeline, AggregationStrategy
from typing import Union, List, Optional, Tuple
class MyTokenClassificationPipeline(TokenClassificationPipeline):
def _sanitize_parameters(
self,
ignore_labels=None,
grouped_entities: Optional[bool] = None,
ignore_subwords: Optional[bool] = None,
aggregation_strategy: Optional[AggregationStrategy] = None,
offset_mapping: Optional[List[Tuple[int, int]]] = None,
stride: Optional[int] = None,
):
preprocess_params, other, postprocess_params = super()._sanitize_parameters(
ignore_labels,
grouped_entities,
ignore_subwords,
aggregation_strategy,
offset_mapping,
stride
)
preprocess_params['tokenizer_params'] = {'return_token_type_ids': False}
return preprocess_params, other, postprocess_params
p = MyTokenClassificationPipeline(model=model, framework='pt', task='ner', tokenizer=tokenizer)
posted by @@rcshubhadeep
Hi,
I am really lost in something which relates to exporting DeBERTa in ONNX. I have the following code -
Results into this error -
Whatever I do I can't get rid of this error. I am a noob with HF. I have posted the issue in both forum and SO. But no replies. Here is my last hope. What am I doing wrong? I read the PR that adds the V2 support as well. I can't figure out what is wrong. Even when I use
return_token_type_ids=False
in my tokenizer call it does not solve the problem. My transformer version is 4.27.0 and my optimum version is also the latest.Originally posted by @rcshubhadeep in https://github.com/huggingface/optimum/issues/555#issuecomment-1501110962