huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.27k stars 26.85k forks source link

KeyError: 'labels' during Distilling Zero Shot Classification #12182

Open controldev opened 3 years ago

controldev commented 3 years ago

EDIT: I confirmed that this happens with the example script as it is, so no other changes are required to reproduce this.

Environment info

Who can help

Tagging @VictorSanh @sgugger, @patil-suraj (please correct me if I'm wrong)

Information

Model I am using (Bert, XLNet ...):

Student: distilbert-base-uncased Teacher: roberta-large-mnli

The problem arises when using:

The tasks I am working on is:

I'm simply running the official colab script Distilling Zero Shot Classification.ipynb, but get a key error when performing the first epoch of the student training.

To reproduce

Steps to reproduce the behavior:

  1. Open the official script https://t.co/JAJ6Eb78vM?amp=1 (you can find this link here as well https://twitter.com/joeddav/status/1363543296166002688?lang=en)
  2. Run all the required cells before training
  3. Run the cell that runs transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py
  4. Witness the KeyError: 'labels' on the first epoch of the student model training

Full logs:

`2021-06-16 15:33:19.328924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 06/16/2021 15:33:20 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False 06/16/2021 15:33:20 - INFO - main - Training/evaluation parameters DistillTrainingArguments(output_dir='./distilbert-base-uncased-agnews-student', overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=<IntervalStrategy.NO: 'no'>, prediction_loss_only=False, per_device_train_batch_size=32, per_device_eval_batch_size=128, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_ratio=0.0, warmup_steps=0, logging_dir='runs/Jun16_15-33-20_9d2a3f891a99', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=False, logging_steps=500, save_strategy=<IntervalStrategy.STEPS: 'steps'>, save_steps=500, save_total_limit=0, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', fp16_backend='auto', fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name='./distilbert-base-uncased-agnews-student', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name='length', report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, mp_parameters='') 06/16/2021 15:33:20 - INFO - main - Generating predictions from zero-shot teacher model [INFO|configuration_utils.py:517] 2021-06-16 15:33:21,219 >> loading configuration file https://huggingface.co/roberta-large-mnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fab42bdbd5cb5e6ff7cabeb9bcc12728f56022f50b9644a3079904564f2bc704.ddc5961cccf081d6ca7f4f58ee119c21895aa9b19f0044f01954cd2ff42fefcb [INFO|configuration_utils.py:553] 2021-06-16 15:33:21,220 >> Model config RobertaConfig { "_num_labels": 3, "architectures": [ "RobertaForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "id2label": { "0": "CONTRADICTION", "1": "NEUTRAL", "2": "ENTAILMENT" }, "initializer_range": 0.02, "intermediate_size": 4096, "label2id": { "CONTRADICTION": 0, "ENTAILMENT": 2, "NEUTRAL": 1 }, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.6.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 }

[INFO|modeling_utils.py:1155] 2021-06-16 15:33:21,507 >> loading weights file https://huggingface.co/roberta-large-mnli/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/63cbd98723b89863bcd86a8002e823de3004a139513559246690c65521cdc9b9.38ef55c51c84ab2e78e5a0e2ea9c25830fd074df70d2f10071eb9a1bc1586ca0 [WARNING|modeling_utils.py:1331] 2021-06-16 15:33:44,205 >> Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']

[INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/vocab.json from cache at /root/.cache/huggingface/transformers/64a1d72b2bd05b0aff1a4dd9e7a90a6eea0312b4f914e80b0a923aa8f72219bd.d67d6b367eb24ab43b08ad55e014cf254076934f71d832bbab9ad35644a375ab [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/merges.txt from cache at /root/.cache/huggingface/transformers/425529714b758f50b6d3f93f8093d859856fd41cf1cec7c8edf2ab44aee632b6.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/d077eac6b48c43618a441cba6eab600a5cc6383b98e7eada6d1ad4d3f3cc457e.fc9576039592f026ad76a1c231b89aee8668488c671dfbe6616bab2ed298d730 [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/tokenizer_config.json from cache at None 100% 15000/15000 [1:15:16<00:00, 3.32it/s] 06/16/2021 16:49:06 - INFO - main - Initializing student model [INFO|file_utils.py:1532] 2021-06-16 16:49:07,106 >> https://huggingface.co/distilbert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpy7f4tyyh Downloading: 100% 442/442 [00:00<00:00, 348kB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:07,540 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|file_utils.py:1544] 2021-06-16 16:49:07,540 >> creating metadata file for /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|configuration_utils.py:517] 2021-06-16 16:49:07,540 >> loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|configuration_utils.py:553] 2021-06-16 16:49:07,541 >> Model config DistilBertConfig { "activation": "gelu", "architectures": [ "DistilBertForMaskedLM" ], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3" }, "initializer_range": 0.02, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3 }, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tieweights": true, "transformers_version": "4.6.1", "vocab_size": 30522 }

[INFO|file_utils.py:1532] 2021-06-16 16:49:07,820 >> https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmptuo3f4g2 Downloading: 100% 268M/268M [00:04<00:00, 62.4MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:12,343 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [INFO|file_utils.py:1544] 2021-06-16 16:49:12,343 >> creating metadata file for /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [INFO|modeling_utils.py:1155] 2021-06-16 16:49:12,343 >> loading weights file https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [WARNING|modeling_utils.py:1331] 2021-06-16 16:49:12,787 >> Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_projector.weight']

[INFO|file_utils.py:1532] 2021-06-16 16:49:13,357 >> https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmps3o1_gw9 Downloading: 100% 232k/232k [00:00<00:00, 1.83MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:13,766 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|file_utils.py:1544] 2021-06-16 16:49:13,766 >> creating metadata file for /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|file_utils.py:1532] 2021-06-16 16:49:14,049 >> https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp1n0mi2iy Downloading: 100% 466k/466k [00:00<00:00, 3.48MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:14,616 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|file_utils.py:1544] 2021-06-16 16:49:14,616 >> creating metadata file for /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|file_utils.py:1532] 2021-06-16 16:49:15,461 >> https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmperm21jrj Downloading: 100% 28.0/28.0 [00:00<00:00, 22.2kB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:15,745 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 [INFO|file_utils.py:1544] 2021-06-16 16:49:15,745 >> creating metadata file for /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 100% 120000/120000 [00:32<00:00, 3647.18ex/s] 06/16/2021 16:49:49 - INFO - main - Training student model on teacher predictions [INFO|trainer.py:516] 2021-06-16 16:49:49,272 >> The following columns in the training set don't have a corresponding argument in DistilBertForSequenceClassification.forward and have been ignored: text. [INFO|trainer.py:1156] 2021-06-16 16:49:49,285 >> Running training [INFO|trainer.py:1157] 2021-06-16 16:49:49,285 >> Num examples = 120000 [INFO|trainer.py:1158] 2021-06-16 16:49:49,285 >> Num Epochs = 1 [INFO|trainer.py:1159] 2021-06-16 16:49:49,285 >> Instantaneous batch size per device = 32 [INFO|trainer.py:1160] 2021-06-16 16:49:49,285 >> Total train batch size (w. parallel, distributed & accumulation) = 32 [INFO|trainer.py:1161] 2021-06-16 16:49:49,285 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1162] 2021-06-16 16:49:49,286 >> Total optimization steps = 3750 0% 0/3750 [00:00<?, ?it/s]Traceback (most recent call last): File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 338, in main() File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 328, in main trainer.train() File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1272, in train tr_loss += self.training_step(model, inputs) File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1734, in training_step loss = self.compute_loss(model, inputs) File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 119, in compute_loss target_p = inputs["labels"] KeyError: 'labels' 0% 0/3750 [00:00<?, ?it/s] `

Expected behavior

Not throw a KeyError

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

MLMarins commented 3 years ago

I have the same issue.

erip commented 3 years ago

This should be re-opened.

LysandreJik commented 3 years ago

Hello, unfortunately, none of the maintainers have the bandwidth to assist in the resolution of this issue. I'm putting a Good second issue label and pinging the original author @joeddav. We're happy to review a PR!

sadakmed commented 3 years ago

@MLMarins @erip could you provide more details!

erip commented 3 years ago

@sadakmed create a text file with a few lines and run the zero-shot text classification example with arbitrary labels. There is some breakage in the datasets API that causes the labels from the teacher to not be propagated.

galbraic commented 2 years ago

I was also encountering this error, and noticed that the call to datasets.Dataset.map() in line 310 is the culprit. It drops the labels column from the dataset. Try replacing it with the following

ds_tokenized = dataset.map(tokenizer, input_columns="text")
dataset = Dataset.from_dict(
    {
        "text": ds_tokenized[:]["text"],
        "labels": teacher_soft_preds,  # output of get_teacher_predictions()
        "input_ids": ds_tokenized[:]["input_ids"],
        "attention_mask": ds_tokenized[:]["attention_mask"],
    }
)
dataset.set_format("torch")
pramodith commented 2 years ago

@LysandreJik I've created a PR for this issue, please take a look when you get the chance to.