Closed shivin9 closed 5 years ago
Sure, one way you could go about it would be to create a new class similar to BertForSequenceClassification
and implement your own custom final classifier.
The lib is pretty modular so you can usually subclass/extend what you need.
You can also replace self.classifier
with your own model.
model = BertForSequenceClassification.from_pretrained("bert-base-multilingual-cased")
model.classifier = new_classifier
where new_classifier
is any pytorch model that you want.
ok... Thanks a lot. I will try it.
@dhpollack Maybe its a little unrelated to this issue, but still I'll state the situation. I am using the BERT model to classify sentences on two different datasets. It is working fine on the first dataset but not on the second. Is it possible that it is because BERT has saved its weights according to the first dataset and is loading that for the second one also and thus not performing well. For example. the model configuration looks like this for BOTH the datasets. I suspect whether it should have the same vocabulary size.
INFO:pytorch_pretrained_bert.modeling:Model config {
"attention_probs_dropout_prob": 0.1,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"type_vocab_size": 2,
"vocab_size": 28996
}
It shows the same message on both the datasets
INFO:pytorch_pretrained_bert.tokenization:loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-vocab.txt from cache at /home/pytorch/.pytorch_pretrained_bert/5e8a2b4893d13790ed4150ca1906be5f7a03d6c4ddf62296c383f6db42814db2.e13dbb970cb325137104fb2e5f36fe865f27746c6b526f6352861b1980eb80b1
INFO:pytorch_pretrained_bert.modeling:loading archive file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz from cache at cache/a803ce83ca27fecf74c355673c434e51c265fb8a3e0e57ac62a80e38ba98d384.681017f415dfb33ec8d0e04fe51a619f3f01532ecea04edbfd48c5d160550d9c
INFO:pytorch_pretrained_bert.modeling:extracting archive file cache/a803ce83ca27fecf74c355673c434e51c265fb8a3e0e57ac62a80e38ba98d384.681017f415dfb33ec8d0e04fe51a619f3f01532ecea04edbfd48c5d160550d9c to temp dir /tmp/tmpgummmons
How can effectively use BERT for two different datasets?
@shivin9 this is definitely not related to the classifier layer. Also, it's a little unclear what you what to do. Are you training on one dataset and then doing inference on another? If that's the case, then you do something like
# training
model = BertForSequenceClassification.from_pretrained("bert-base-cased")
...
model.save_pretrained("/tmp/trained_model_dir")
# inference
model = BertForSequenceClassification.from_pretrained("/tmp/trained_model_dir")
But as I said, it's unclear. If you are training on both datasets and getting good results on one but not the other than it probably has to do with your preprocessing. Good luck solving your problem.
Hi, I have a related question. I am experimenting with BERT for classification task. When I use `BertForSequenceClassification.from_pretrained ```, I can get 100% accuracy for a small data set. But if I have a customized classification head as shown below which is almost similar to ` `BertForSequenceClassification
I get bad accuracy.
here is my customized classification head:
class Bertclfhead(nn.Module):
def __init__(self, config, adapt_args, bertmodel):
super().__init__()
self.num_labels = adapt_args.num_classes
self.config = config
self.bert = bertmodel
self.dropout = nn.Dropout(config['hidden_dropout_prob'])
self.classifier = nn.Linear(config['hidden_size'], adapt_args.num_classes)
def forward(self, input_ids, token_type_ids=None, attention_mask=None, labels=None, position_ids=None, head_mask=None):
outputs = self.bert(input_ids, position_ids=position_ids, token_type_ids=token_type_ids,
attention_mask=attention_mask, head_mask=head_mask)
pooled_output = outputs[1] # see note below
pooled_output = self.dropout(pooled_output)
logits = self.classifier(pooled_output)
outputs = (logits,) + outputs[2:] # add hidden states and attention if they are here
if labels is not None:
if self.num_labels == 1:
# We are doing regression
loss_fct = MSELoss()
loss = loss_fct(logits.view(-1), labels.view(-1))
else:
loss_fct = CrossEntropyLoss()
loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
outputs = (loss,) + outputs
return outputs # (loss), logits, (hidden_states), (attentions)
and I initialize my model like this:
model = Bertclfhead(bertconfig, adapt_args, BertModel.from_pretrained('bert-base-uncased'))
am I missing something?
@dhpollack I am first training on x
and then inferring on x
. Then I'm training on y
and inferring on y
.
I am also trying to put a BiLSTM on top of BERT but it seems that BERT doesn't output the vectors in the required format i.e. (#batches, seq_len, input_dim)
. Do you have any idea how it can be solved? Right now BERT is just outputting a (BATCH_SIZE, 768) sized vector. 768 being the size of hidden layer.
@shivin9 you should read the docs. You want to output of the hidden layers but I think an lstm on top of Bert is overkill. What you are getting now is the output of the pooling layer.
Also you should close this issue since it's clear this is not an issue with the library.
Yeah sure. thanks for the help.
@mehdimashayekhi Do you solve it? Ihave the same question! By directly use BertForSequenceClassification
and custom a classification similar to BertForSequenceClassification
, the results totally different.
@dhpollack I am first training on
x
and then inferring onx
. Then I'm training ony
and inferring ony
.I am also trying to put a BiLSTM on top of BERT but it seems that BERT doesn't output the vectors in the required format i.e.
(#batches, seq_len, input_dim)
. Do you have any idea how it can be solved? Right now BERT is just outputting a (BATCH_SIZE, 768) sized vector. 768 being the size of hidden layer.
Were you able to resolve this?
Re dhpollack's August 12 comment. Maybe something got changed between then and now but I found you also have to set the model's number of labels to get that to work.
model.classifier = torch.nn.Linear(768, 8)
model.num_labels = 8
You can also replace
self.classifier
with your own model.model = BertForSequenceClassification.from_pretrained("bert-base-multilingual-cased") model.classifier = new_classifier
where
new_classifier
is any pytorch model that you want.
Hi, I'm using your suggestion to customise the classifier head:
model = AutoModelForSequenceClassification.from_pretrained(PATH_TO_REPO)
# custom head
class CustomClassifier(nn.Module):
def __init__(self, config):
super().__init__()
self.config = config
self.classifier0 = nn.Linear(config.hidden_size, config.hidden_size)
self.classifier1 = nn.Linear(config.hidden_size, config.hidden_size)
self.classifier2 = nn.Linear(config.hidden_size, config.num_labels)
drop_out = getattr(config, "cls_dropout", None)
drop_out = self.config.hidden_dropout_prob if drop_out is None else drop_out
self.dropout = StableDropout(drop_out)
def forward(self, x):
x = F.relu(self.classifier0(x))
x = self.dropout(x)
x = F.relu(self.classifier1(x))
x = self.classifier2(x)
return x
model.classifier = CustomClassifier(model.config)
model.push_to_hub(PATH_TO_REPO)
But, when I want to load this model using from_pretrained
I get the following warning, which means that those additional layers are not loaded and a new head is added to the trained model.
how can I resolve this issue? or do you have any idea on how should I achieve this while having other functionalities of huggingface?
model = AutoModelForSequenceClassification.from_pretrained(PATH_TO_REPO)
Some weights of the model checkpoint at {} were not used when initializing DebertaV2ForSequenceClassification: ['classifier.classifier0.bias', 'classifier.classifier0.weight', 'classifier.classifier1.bias', 'classifier.classifier1.weight', 'classifier.classifier2.bias', 'classifier.classifier2.weight']
- This IS expected if you are initializing DebertaV2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DebertaV2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DebertaV2ForSequenceClassification were not initialized from the model checkpoint at {} and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.```
Hi,
Thanks for providing an efficient and easy-to-use implementation of BERT and other models.
I am working on a project that requires me to do binary classification of sentences. I am using
BertForSequenceClassification
for that but I am not getting good results i.e. my loss function doesn't converge. I noticed that by default there is only a single LinearClassifier on top of the BERT model. Is is possible to change that?Thanks, Shivin