Closed amaiya closed 4 years ago
THe same error happens to me with the distilbert-base-multilingual-cased
Hello !
I got the same error. After having investigated a bit, I found that the error is because the field output_hidden_states
in the configuration file of the model distilbert-base-multilingual-cased
is set to true
instead of false
. As a workaround you can do:
config = DistilBertConfig.from_pretrained("distilbert-base-multilingual-cased", output_hidden_states=False)
model = TFDistilBertForSequenceClassification.from_pretrained("distilbert-base-multilingual-cased", config=config)
And it will works.
@julien-c or @LysandreJik maybe it would be better to update the config file in the S3 repo, what do you think? In order to be aligned with the other models.
Hi, thank you all for raising this issue and looking into it. As @jplu mentioned, this was an issue with the output_hidden_states
in the configuration files. It was the case for two different checkpoints: distilbert-base-multilingual-cased
and distilbert-base-german-cased
.
I've updated the files on S3 and could successfully run the your script @amaiya.
Thanks @jplu and @LysandreJik
Works great now:
# construct toy text classification dataset
categories = ['alt.atheism', 'comp.graphics']
from sklearn.datasets import fetch_20newsgroups
train_b = fetch_20newsgroups(subset='train',
categories=categories, shuffle=True, random_state=42)
test_b = fetch_20newsgroups(subset='test',
categories=categories, shuffle=True, random_state=42)
x_train = train_b.data
y_train = train_b.target
x_test = test_b.data
y_test = test_b.target
# train with ktrain interface to transformers
import ktrain
from ktrain import text
t = text.Transformer('distilbert-base-multilingual-cased', maxlen=500, classes=train_b.target_names)
trn = t.preprocess_train(x_train, y_train)
val = t.preprocess_test(x_test, y_test)
model = t.get_classifier()
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=6)
learner.fit_onecycle(3e-5, 1)
begin training using onecycle policy with max lr of 3e-05...
Train for 178 steps, validate for 118 steps
178/178 [==============================] - 51s 286ms/step - loss: 0.2541 - accuracy: 0.8816 - val_loss: 0.0862 - val_accuracy: 0.9746
🐛 Bug
I'm finding that several of the TensorFlow 2.0 Sequence Classification models don't seem to work. Case in point:
distilbert-base-uncased
works butdistilbert-base-multilingual-cased
does not.My environment is:
Note that I am using v2.3.0 of
transformers
with patch 1efc208 applied to work around this issue.However, problems with
distilbert-base-multilingual-cased
occur in v2.2.0, as well.Here is code to reproduce the problem.
The code above produces the following error:
However, if you set MODEL_NAME to
distilbert-base-uncased
, everything works.Other models that I've found do not work in TF2 include
xlnet-base-cased
. To reproduce, set MODEL_NAME toxlnet-base-cased
in the code above. Thexlnet-base-cased
model also throws an exception during the call tomodel.fit
.