huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

https://huggingface.co/transformers

Apache License 2.0

134.93k stars 26.99k forks source link

Sharing Microsoft's DialogRPT (new dialog ranking model) #7493

Closed golsun closed 4 years ago

golsun commented 4 years ago

🌟 New model addition

Model description

Thanks for the awesome work!

DialogRPT (Dialog Ranking Pretrained Transformers) is a set of GPT-2 based dialogue ranking models recently released with an EMNLP paper by Microsoft Research. It's a follow-up work of DialoGPT (thanks for hosting it!) The architecture is pretty simple: a GPT2Model followed by a torch.nn.Linear(n_embd, 1, bias=False), and implemented based on a previous HuggingFace commit At first, I'm trying to create a model card for it, but then realized that it seems there's no existing model architecture in HuggingFace is compatible with DialogRPT. I noticed a lot of BERT-based sequence classification models, but ours is GPT-2 based.

If there's a simple fix (or I missed something) please let me know! If implementation in modeling_gpt2.py is necessary, I'm also glad to help!

Open source status

[x] the model implementation is available: (https://github.com/golsun/DialogRPT)
[x] the model weights are available: (https://github.com/golsun/DialogRPT)
[x] who are the authors: @golsun @dreasysnail

LysandreJik commented 4 years ago

Hi @golsun! Thanks a lot for opening an issue and offering to contribute it!

Indeed, there is no GPT2ForSequenceClassification model in the library (yet!) I'm adding it right now with the goal of supporting DialogRPT. I'll get back to you in a bit.

LysandreJik commented 4 years ago

Hi @golsun! GPT2ForSequenceClassification has been implemented on #7501 and I verified that I obtain the same results as you do on your README using your examples.

You should only need to upload your models on the model hub now! Some helpers regarding the configuration:

You should upload a model configuration on the hub, for every model.
You can simply copy-paste the gpt2-medium configuration that you can find here.
You will need to add a num_labels=1 field to these configurations.
In the architectures field, you should put GPT2ForSequenceClassification

golsun commented 4 years ago

wow, super fast!!! thank you @LysandreJik , I'll update my repo to reflect this once the pull is merged.

LysandreJik commented 4 years ago

The pul request is now merged @golsun!

golsun commented 4 years ago

Thank you so much @LysandreJik ! I just tried GPT2ForSequenceClassification and it works! 👍 Then I created this model card, but model = AutoModelForSequenceClassification.from_pretrained("microsoft/DialogRPT-updown") gives me the following error, which can be reproduced with this Notebook:

/content/transformers/src/transformers/modeling_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs)
   1203                 config.__class__,
   1204                 cls.__name__,
-> 1205                 ", ".join(c.__name__ for c in MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING.keys()),
   1206             )
   1207         )

ValueError: Unrecognized configuration class <class 'transformers.configuration_gpt2.GPT2Config'> for this kind of AutoModel: AutoModelForSequenceClassification.
Model type should be one of DistilBertConfig, AlbertConfig, CamembertConfig, XLMRobertaConfig, BartConfig, LongformerConfig, RobertaConfig, SqueezeBertConfig, BertConfig, XLNetConfig, MobileBertConfig, FlaubertConfig, XLMConfig, ElectraConfig, FunnelConfig, DebertaConfig.

LysandreJik commented 4 years ago

Indeed, this should be solved by #7630.

golsun commented 4 years ago

thank you @LysandreJik AutoModelForSequenceClassification works now. The inference webpage still gives the Unrecognized configuration class error but I guess it will sync with the latest code soon. I'm going to introduce model card in the original repo. Thanks again for the help!

LysandreJik commented 4 years ago

We just updated the API inference so that it uses the latest code. I've taken the liberty to add a padding token to your models, in your configuration (pad_token_id: 50256) and in the special_tokens_map.json: pad_token: "<|endoftext|>", as it is necessary for the models to have a padding token to run in the API inference.

I've taken these values from your code here and here.

Models should now work correctly in the inference webpage :)

golsun commented 4 years ago

Great! Thank you for updating the config and special_tokens_map for us! :) The inference webpage will output a score of 1 no matter what input is. I guess it's because it outputs softmax(logits), which is always 1 if num_labels==1. Maybe the following if-else will fix it?

if num_labels == 1:
    return torch.sigmoid(logits)
else:
    return torch.softmax(logits)

the case num_labels==1 follows the DialogRPT code here

LysandreJik commented 4 years ago

You're correct! Solving that in #7726.

julien-c commented 4 years ago

Also @golsun on the inference API, you can have custom label names (instead of just LABEL_0 here) if you set your label names in your config.json

See https://huggingface.co/roberta-large-mnli's config.json file for an example

golsun commented 4 years ago

Awesome! thank you @LysandreJik @julien-c