Closed golsun closed 4 years ago
Hi @golsun! Thanks a lot for opening an issue and offering to contribute it!
Indeed, there is no GPT2ForSequenceClassification
model in the library (yet!) I'm adding it right now with the goal of supporting DialogRPT. I'll get back to you in a bit.
Hi @golsun! GPT2ForSequenceClassification
has been implemented on #7501 and I verified that I obtain the same results as you do on your README using your examples.
You should only need to upload your models on the model hub now! Some helpers regarding the configuration:
gpt2-medium
configuration that you can find here.num_labels=1
field to these configurations.architectures
field, you should put GPT2ForSequenceClassification
wow, super fast!!! thank you @LysandreJik , I'll update my repo to reflect this once the pull is merged.
The pul request is now merged @golsun!
Thank you so much @LysandreJik !
I just tried GPT2ForSequenceClassification
and it works! 👍
Then I created this model card, but model = AutoModelForSequenceClassification.from_pretrained("microsoft/DialogRPT-updown")
gives me the following error, which can be reproduced with this Notebook:
/content/transformers/src/transformers/modeling_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
1203 config.__class__,
1204 cls.__name__,
-> 1205 ", ".join(c.__name__ for c in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING.keys()),
1206 )
1207 )
ValueError: Unrecognized configuration class <class 'transformers.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForSequenceClassification.
Model type should be one of DistilBertConfig, AlbertConfig, CamembertConfig, XLMRobertaConfig, BartConfig, LongformerConfig, RobertaConfig, SqueezeBertConfig, BertConfig, XLNetConfig, MobileBertConfig, FlaubertConfig, XLMConfig, ElectraConfig, FunnelConfig, DebertaConfig.
Indeed, this should be solved by #7630.
thank you @LysandreJik AutoModelForSequenceClassification
works now.
The inference webpage still gives the Unrecognized configuration class
error but I guess it will sync with the latest code soon.
I'm going to introduce model card in the original repo.
Thanks again for the help!
We just updated the API inference so that it uses the latest code. I've taken the liberty to add a padding token to your models, in your configuration (pad_token_id: 50256
) and in the special_tokens_map.json
: pad_token: "<|endoftext|>"
, as it is necessary for the models to have a padding token to run in the API inference.
I've taken these values from your code here and here.
Models should now work correctly in the inference webpage :)
Great! Thank you for updating the config and special_tokens_map for us! :)
The inference webpage will output a score of 1 no matter what input is. I guess it's because it outputs softmax(logits)
, which is always 1 if num_labels==1
. Maybe the following if-else will fix it?
if num_labels == 1:
return torch.sigmoid(logits)
else:
return torch.softmax(logits)
the case num_labels==1
follows the DialogRPT code here
You're correct! Solving that in #7726.
Also @golsun on the inference API, you can have custom label names (instead of just LABEL_0
here) if you set your label names in your config.json
See https://huggingface.co/roberta-large-mnli's config.json file for an example
Awesome! thank you @LysandreJik @julien-c
🌟 New model addition
Model description
Thanks for the awesome work!
DialogRPT (Dialog Ranking Pretrained Transformers) is a set of GPT-2 based dialogue ranking models recently released with an EMNLP paper by Microsoft Research. It's a follow-up work of DialoGPT (thanks for hosting it!) The architecture is pretty simple: a
GPT2Model
followed by atorch.nn.Linear(n_embd, 1, bias=False)
, and implemented based on a previous HuggingFace commit At first, I'm trying to create a model card for it, but then realized that it seems there's no existing model architecture in HuggingFace is compatible with DialogRPT. I noticed a lot of BERT-based sequence classification models, but ours is GPT-2 based.If there's a simple fix (or I missed something) please let me know! If implementation in modeling_gpt2.py is necessary, I'm also glad to help!
Open source status