KeyError: 'labels' during Distilling Zero Shot Classification

controldev commented 3 years ago

EDIT: I confirmed that this happens with the example script as it is, so no other changes are required to reproduce this.

Environment info

transformers version: 4.6.1
Platform: Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.10
PyTorch version (GPU?): 1.8.1+cu101 (True)
Tensorflow version (GPU?): 2.5.0 (True)
Using GPU in script?: Yes (NVIDIA P100)
Using distributed or parallel set-up in script?: No

Who can help

Tagging @VictorSanh @sgugger, @patil-suraj (please correct me if I'm wrong)

Information

Model I am using (Bert, XLNet ...):

Student: distilbert-base-uncased Teacher: roberta-large-mnli

The problem arises when using:

[x] the official example scripts: (give details below)
[ ] my own modified scripts: (give details below)

The tasks I am working on is:

[ ] an official GLUE/SQUaD task: (give the name)
[ ] my own task or dataset: (give details below)
[x] other: distillation of zero shot text classification models (research_projects)

I'm simply running the official colab script Distilling Zero Shot Classification.ipynb, but get a key error when performing the first epoch of the student training.

To reproduce

Steps to reproduce the behavior:

Open the official script https://t.co/JAJ6Eb78vM?amp=1 (you can find this link here as well https://twitter.com/joeddav/status/1363543296166002688?lang=en)
Run all the required cells before training
Run the cell that runs transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py
Witness the KeyError: 'labels' on the first epoch of the student model training

Full logs:

`2021-06-16 15:33:19.328924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 06/16/2021 15:33:20 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False 06/16/2021 15:33:20 - INFO - main - Training/evaluation parameters DistillTrainingArguments(output_dir='./distilbert-base-uncased-agnews-student', overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=<IntervalStrategy.NO: 'no'>, prediction_loss_only=False, per_device_train_batch_size=32, per_device_eval_batch_size=128, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_ratio=0.0, warmup_steps=0, logging_dir='runs/Jun16_15-33-20_9d2a3f891a99', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=False, logging_steps=500, save_strategy=<IntervalStrategy.STEPS: 'steps'>, save_steps=500, save_total_limit=0, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', fp16_backend='auto', fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name='./distilbert-base-uncased-agnews-student', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name='length', report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, mp_parameters='') 06/16/2021 15:33:20 - INFO - main - Generating predictions from zero-shot teacher model [INFO|configuration_utils.py:517] 2021-06-16 15:33:21,219 >> loading configuration file https://huggingface.co/roberta-large-mnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fab42bdbd5cb5e6ff7cabeb9bcc12728f56022f50b9644a3079904564f2bc704.ddc5961cccf081d6ca7f4f58ee119c21895aa9b19f0044f01954cd2ff42fefcb [INFO|configuration_utils.py:553] 2021-06-16 15:33:21,220 >> Model config RobertaConfig { "_num_labels": 3, "architectures": [ "RobertaForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "id2label": { "0": "CONTRADICTION", "1": "NEUTRAL", "2": "ENTAILMENT" }, "initializer_range": 0.02, "intermediate_size": 4096, "label2id": { "CONTRADICTION": 0, "ENTAILMENT": 2, "NEUTRAL": 1 }, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.6.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 }

[INFO|modeling_utils.py:1155] 2021-06-16 15:33:21,507 >> loading weights file https://huggingface.co/roberta-large-mnli/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/63cbd98723b89863bcd86a8002e823de3004a139513559246690c65521cdc9b9.38ef55c51c84ab2e78e5a0e2ea9c25830fd074df70d2f10071eb9a1bc1586ca0 [WARNING|modeling_utils.py:1331] 2021-06-16 15:33:44,205 >> Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']

This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [INFO|modeling_utils.py:1348] 2021-06-16 15:33:44,205 >> All the weights of RobertaForSequenceClassification were initialized from the model checkpoint at roberta-large-mnli. If your task is similar to the task the model of the checkpoint was trained on, you can already use RobertaForSequenceClassification for predictions without further training. [INFO|configuration_utils.py:517] 2021-06-16 15:33:47,683 >> loading configuration file https://huggingface.co/roberta-large-mnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fab42bdbd5cb5e6ff7cabeb9bcc12728f56022f50b9644a3079904564f2bc704.ddc5961cccf081d6ca7f4f58ee119c21895aa9b19f0044f01954cd2ff42fefcb [INFO|configuration_utils.py:553] 2021-06-16 15:33:47,684 >> Model config RobertaConfig { "_num_labels": 3, "architectures": [ "RobertaForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "id2label": { "0": "CONTRADICTION", "1": "NEUTRAL", "2": "ENTAILMENT" }, "initializer_range": 0.02, "intermediate_size": 4096, "label2id": { "CONTRADICTION": 0, "ENTAILMENT": 2, "NEUTRAL": 1 }, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.6.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 }

[INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/vocab.json from cache at /root/.cache/huggingface/transformers/64a1d72b2bd05b0aff1a4dd9e7a90a6eea0312b4f914e80b0a923aa8f72219bd.d67d6b367eb24ab43b08ad55e014cf254076934f71d832bbab9ad35644a375ab [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/merges.txt from cache at /root/.cache/huggingface/transformers/425529714b758f50b6d3f93f8093d859856fd41cf1cec7c8edf2ab44aee632b6.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/d077eac6b48c43618a441cba6eab600a5cc6383b98e7eada6d1ad4d3f3cc457e.fc9576039592f026ad76a1c231b89aee8668488c671dfbe6616bab2ed298d730 [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/tokenizer_config.json from cache at None 100% 15000/15000 [1:15:16<00:00, 3.32it/s] 06/16/2021 16:49:06 - INFO - main - Initializing student model [INFO|file_utils.py:1532] 2021-06-16 16:49:07,106 >> https://huggingface.co/distilbert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpy7f4tyyh Downloading: 100% 442/442 [00:00<00:00, 348kB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:07,540 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|file_utils.py:1544] 2021-06-16 16:49:07,540 >> creating metadata file for /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|configuration_utils.py:517] 2021-06-16 16:49:07,540 >> loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|configuration_utils.py:553] 2021-06-16 16:49:07,541 >> Model config DistilBertConfig { "activation": "gelu", "architectures": [ "DistilBertForMaskedLM" ], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3" }, "initializer_range": 0.02, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3 }, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tieweights": true, "transformers_version": "4.6.1", "vocab_size": 30522 }

[INFO|file_utils.py:1532] 2021-06-16 16:49:07,820 >> https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmptuo3f4g2 Downloading: 100% 268M/268M [00:04<00:00, 62.4MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:12,343 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [INFO|file_utils.py:1544] 2021-06-16 16:49:12,343 >> creating metadata file for /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [INFO|modeling_utils.py:1155] 2021-06-16 16:49:12,343 >> loading weights file https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [WARNING|modeling_utils.py:1331] 2021-06-16 16:49:12,787 >> Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_projector.weight']

This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:1342] 2021-06-16 16:49:12,787 >> Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'pre_classifier.weight', 'classifier.weight'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. [INFO|configuration_utils.py:517] 2021-06-16 16:49:13,073 >> loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|configuration_utils.py:553] 2021-06-16 16:49:13,074 >> Model config DistilBertConfig { "activation": "gelu", "architectures": [ "DistilBertForMaskedLM" ], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "initializer_range": 0.02, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tieweights": true, "transformers_version": "4.6.1", "vocab_size": 30522 }

[INFO|file_utils.py:1532] 2021-06-16 16:49:13,357 >> https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmps3o1_gw9 Downloading: 100% 232k/232k [00:00<00:00, 1.83MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:13,766 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|file_utils.py:1544] 2021-06-16 16:49:13,766 >> creating metadata file for /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|file_utils.py:1532] 2021-06-16 16:49:14,049 >> https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp1n0mi2iy Downloading: 100% 466k/466k [00:00<00:00, 3.48MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:14,616 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|file_utils.py:1544] 2021-06-16 16:49:14,616 >> creating metadata file for /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|file_utils.py:1532] 2021-06-16 16:49:15,461 >> https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmperm21jrj Downloading: 100% 28.0/28.0 [00:00<00:00, 22.2kB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:15,745 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 [INFO|file_utils.py:1544] 2021-06-16 16:49:15,745 >> creating metadata file for /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 100% 120000/120000 [00:32<00:00, 3647.18ex/s] 06/16/2021 16:49:49 - INFO - main - Training student model on teacher predictions [INFO|trainer.py:516] 2021-06-16 16:49:49,272 >> The following columns in the training set don't have a corresponding argument in DistilBertForSequenceClassification.forward and have been ignored: text. [INFO|trainer.py:1156] 2021-06-16 16:49:49,285 >> Running training [INFO|trainer.py:1157] 2021-06-16 16:49:49,285 >> Num examples = 120000 [INFO|trainer.py:1158] 2021-06-16 16:49:49,285 >> Num Epochs = 1 [INFO|trainer.py:1159] 2021-06-16 16:49:49,285 >> Instantaneous batch size per device = 32 [INFO|trainer.py:1160] 2021-06-16 16:49:49,285 >> Total train batch size (w. parallel, distributed & accumulation) = 32 [INFO|trainer.py:1161] 2021-06-16 16:49:49,285 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1162] 2021-06-16 16:49:49,286 >> Total optimization steps = 3750 0% 0/3750 [00:00<?, ?it/s]Traceback (most recent call last): File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 338, in main() File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 328, in main trainer.train() File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1272, in train tr_loss += self.training_step(model, inputs) File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1734, in training_step loss = self.compute_loss(model, inputs) File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 119, in compute_loss target_p = inputs["labels"] KeyError: 'labels' 0% 0/3750 [00:00<?, ?it/s] `

Expected behavior

Not throw a KeyError

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

MLMarins commented 3 years ago

I have the same issue.

erip commented 3 years ago

This should be re-opened.

LysandreJik commented 3 years ago

Hello, unfortunately, none of the maintainers have the bandwidth to assist in the resolution of this issue. I'm putting a Good second issue label and pinging the original author @joeddav. We're happy to review a PR!

sadakmed commented 3 years ago

@MLMarins @erip could you provide more details!

erip commented 3 years ago

@sadakmed create a text file with a few lines and run the zero-shot text classification example with arbitrary labels. There is some breakage in the datasets API that causes the labels from the teacher to not be propagated.

galbraic commented 2 years ago

I was also encountering this error, and noticed that the call to datasets.Dataset.map() in line 310 is the culprit. It drops the labels column from the dataset. Try replacing it with the following

ds_tokenized = dataset.map(tokenizer, input_columns="text")
dataset = Dataset.from_dict(
    {
        "text": ds_tokenized[:]["text"],
        "labels": teacher_soft_preds,  # output of get_teacher_predictions()
        "input_ids": ds_tokenized[:]["input_ids"],
        "attention_mask": ds_tokenized[:]["attention_mask"],
    }
)
dataset.set_format("torch")

pramodith commented 2 years ago

@LysandreJik I've created a PR for this issue, please take a look when you get the chance to.

huggingface / transformers