Open controldev opened 3 years ago
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I have the same issue.
This should be re-opened.
Hello, unfortunately, none of the maintainers have the bandwidth to assist in the resolution of this issue. I'm putting a Good second issue
label and pinging the original author @joeddav. We're happy to review a PR!
@MLMarins @erip could you provide more details!
@sadakmed create a text file with a few lines and run the zero-shot text classification example with arbitrary labels. There is some breakage in the datasets
API that causes the labels from the teacher to not be propagated.
I was also encountering this error, and noticed that the call to datasets.Dataset.map() in line 310 is the culprit. It drops the labels
column from the dataset. Try replacing it with the following
ds_tokenized = dataset.map(tokenizer, input_columns="text")
dataset = Dataset.from_dict(
{
"text": ds_tokenized[:]["text"],
"labels": teacher_soft_preds, # output of get_teacher_predictions()
"input_ids": ds_tokenized[:]["input_ids"],
"attention_mask": ds_tokenized[:]["attention_mask"],
}
)
dataset.set_format("torch")
@LysandreJik I've created a PR for this issue, please take a look when you get the chance to.
EDIT: I confirmed that this happens with the example script as it is, so no other changes are required to reproduce this.
Environment info
transformers
version: 4.6.1Who can help
Tagging @VictorSanh @sgugger, @patil-suraj (please correct me if I'm wrong)
Information
Model I am using (Bert, XLNet ...):
Student:
distilbert-base-uncased
Teacher:roberta-large-mnli
The problem arises when using:
The tasks I am working on is:
I'm simply running the official colab script
Distilling Zero Shot Classification.ipynb
, but get a key error when performing the first epoch of the student training.To reproduce
Steps to reproduce the behavior:
transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py
KeyError: 'labels'
on the first epoch of the student model trainingFull logs:
`2021-06-16 15:33:19.328924: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0 06/16/2021 15:33:20 - WARNING - main - Process rank: -1, device: cuda:0, n_gpu: 1distributed training: False, 16-bits training: False 06/16/2021 15:33:20 - INFO - main - Training/evaluation parameters DistillTrainingArguments(output_dir='./distilbert-base-uncased-agnews-student', overwrite_output_dir=False, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=<IntervalStrategy.NO: 'no'>, prediction_loss_only=False, per_device_train_batch_size=32, per_device_eval_batch_size=128, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_ratio=0.0, warmup_steps=0, logging_dir='runs/Jun16_15-33-20_9d2a3f891a99', logging_strategy=<IntervalStrategy.STEPS: 'steps'>, logging_first_step=False, logging_steps=500, save_strategy=<IntervalStrategy.STEPS: 'steps'>, save_steps=500, save_total_limit=0, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', fp16_backend='auto', fp16_full_eval=False, local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=[], dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name='./distilbert-base-uncased-agnews-student', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, length_column_name='length', report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, use_legacy_prediction_loop=False, push_to_hub=False, resume_from_checkpoint=None, mp_parameters='') 06/16/2021 15:33:20 - INFO - main - Generating predictions from zero-shot teacher model [INFO|configuration_utils.py:517] 2021-06-16 15:33:21,219 >> loading configuration file https://huggingface.co/roberta-large-mnli/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fab42bdbd5cb5e6ff7cabeb9bcc12728f56022f50b9644a3079904564f2bc704.ddc5961cccf081d6ca7f4f58ee119c21895aa9b19f0044f01954cd2ff42fefcb [INFO|configuration_utils.py:553] 2021-06-16 15:33:21,220 >> Model config RobertaConfig { "_num_labels": 3, "architectures": [ "RobertaForSequenceClassification" ], "attention_probs_dropout_prob": 0.1, "bos_token_id": 0, "eos_token_id": 2, "gradient_checkpointing": false, "hidden_act": "gelu", "hidden_dropout_prob": 0.1, "hidden_size": 1024, "id2label": { "0": "CONTRADICTION", "1": "NEUTRAL", "2": "ENTAILMENT" }, "initializer_range": 0.02, "intermediate_size": 4096, "label2id": { "CONTRADICTION": 0, "ENTAILMENT": 2, "NEUTRAL": 1 }, "layer_norm_eps": 1e-05, "max_position_embeddings": 514, "model_type": "roberta", "num_attention_heads": 16, "num_hidden_layers": 24, "pad_token_id": 1, "position_embedding_type": "absolute", "transformers_version": "4.6.1", "type_vocab_size": 1, "use_cache": true, "vocab_size": 50265 }
[INFO|modeling_utils.py:1155] 2021-06-16 15:33:21,507 >> loading weights file https://huggingface.co/roberta-large-mnli/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/63cbd98723b89863bcd86a8002e823de3004a139513559246690c65521cdc9b9.38ef55c51c84ab2e78e5a0e2ea9c25830fd074df70d2f10071eb9a1bc1586ca0 [WARNING|modeling_utils.py:1331] 2021-06-16 15:33:44,205 >> Some weights of the model checkpoint at roberta-large-mnli were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
[INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/vocab.json from cache at /root/.cache/huggingface/transformers/64a1d72b2bd05b0aff1a4dd9e7a90a6eea0312b4f914e80b0a923aa8f72219bd.d67d6b367eb24ab43b08ad55e014cf254076934f71d832bbab9ad35644a375ab [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/merges.txt from cache at /root/.cache/huggingface/transformers/425529714b758f50b6d3f93f8093d859856fd41cf1cec7c8edf2ab44aee632b6.5d12962c5ee615a4c803841266e9c3be9a691a924f72d395d3a6c6c81157788b [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/d077eac6b48c43618a441cba6eab600a5cc6383b98e7eada6d1ad4d3f3cc457e.fc9576039592f026ad76a1c231b89aee8668488c671dfbe6616bab2ed298d730 [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 15:33:49,522 >> loading file https://huggingface.co/roberta-large-mnli/resolve/main/tokenizer_config.json from cache at None 100% 15000/15000 [1:15:16<00:00, 3.32it/s] 06/16/2021 16:49:06 - INFO - main - Initializing student model [INFO|file_utils.py:1532] 2021-06-16 16:49:07,106 >> https://huggingface.co/distilbert-base-uncased/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpy7f4tyyh Downloading: 100% 442/442 [00:00<00:00, 348kB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:07,540 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|file_utils.py:1544] 2021-06-16 16:49:07,540 >> creating metadata file for /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|configuration_utils.py:517] 2021-06-16 16:49:07,540 >> loading configuration file https://huggingface.co/distilbert-base-uncased/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.d423bdf2f58dc8b77d5f5d18028d7ae4a72dcfd8f468e81fe979ada957a8c361 [INFO|configuration_utils.py:553] 2021-06-16 16:49:07,541 >> Model config DistilBertConfig { "activation": "gelu", "architectures": [ "DistilBertForMaskedLM" ], "attention_dropout": 0.1, "dim": 768, "dropout": 0.1, "hidden_dim": 3072, "id2label": { "0": "LABEL_0", "1": "LABEL_1", "2": "LABEL_2", "3": "LABEL_3" }, "initializer_range": 0.02, "label2id": { "LABEL_0": 0, "LABEL_1": 1, "LABEL_2": 2, "LABEL_3": 3 }, "max_position_embeddings": 512, "model_type": "distilbert", "n_heads": 12, "n_layers": 6, "pad_token_id": 0, "qa_dropout": 0.1, "seq_classif_dropout": 0.2, "sinusoidal_pos_embds": false, "tieweights": true, "transformers_version": "4.6.1", "vocab_size": 30522 }
[INFO|file_utils.py:1532] 2021-06-16 16:49:07,820 >> https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmptuo3f4g2 Downloading: 100% 268M/268M [00:04<00:00, 62.4MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:12,343 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [INFO|file_utils.py:1544] 2021-06-16 16:49:12,343 >> creating metadata file for /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [INFO|modeling_utils.py:1155] 2021-06-16 16:49:12,343 >> loading weights file https://huggingface.co/distilbert-base-uncased/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a [WARNING|modeling_utils.py:1331] 2021-06-16 16:49:12,787 >> Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_projector.weight']
[INFO|file_utils.py:1532] 2021-06-16 16:49:13,357 >> https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmps3o1_gw9 Downloading: 100% 232k/232k [00:00<00:00, 1.83MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:13,766 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt in cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|file_utils.py:1544] 2021-06-16 16:49:13,766 >> creating metadata file for /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|file_utils.py:1532] 2021-06-16 16:49:14,049 >> https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmp1n0mi2iy Downloading: 100% 466k/466k [00:00<00:00, 3.48MB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:14,616 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json in cache at /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|file_utils.py:1544] 2021-06-16 16:49:14,616 >> creating metadata file for /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|file_utils.py:1532] 2021-06-16 16:49:15,461 >> https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmperm21jrj Downloading: 100% 28.0/28.0 [00:00<00:00, 22.2kB/s] [INFO|file_utils.py:1536] 2021-06-16 16:49:15,745 >> storing https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json in cache at /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 [INFO|file_utils.py:1544] 2021-06-16 16:49:15,745 >> creating metadata file for /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/vocab.txt from cache at /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer.json from cache at /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4 [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/added_tokens.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:1717] 2021-06-16 16:49:15,746 >> loading file https://huggingface.co/distilbert-base-uncased/resolve/main/tokenizer_config.json from cache at /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79 100% 120000/120000 [00:32<00:00, 3647.18ex/s] 06/16/2021 16:49:49 - INFO - main - Training student model on teacher predictions [INFO|trainer.py:516] 2021-06-16 16:49:49,272 >> The following columns in the training set don't have a corresponding argument in
main()
File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 328, in main
trainer.train()
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1272, in train
tr_loss += self.training_step(model, inputs)
File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1734, in training_step
loss = self.compute_loss(model, inputs)
File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 119, in compute_loss
target_p = inputs["labels"]
KeyError: 'labels'
0% 0/3750 [00:00<?, ?it/s]
`
DistilBertForSequenceClassification.forward
and have been ignored: text. [INFO|trainer.py:1156] 2021-06-16 16:49:49,285 >> Running training [INFO|trainer.py:1157] 2021-06-16 16:49:49,285 >> Num examples = 120000 [INFO|trainer.py:1158] 2021-06-16 16:49:49,285 >> Num Epochs = 1 [INFO|trainer.py:1159] 2021-06-16 16:49:49,285 >> Instantaneous batch size per device = 32 [INFO|trainer.py:1160] 2021-06-16 16:49:49,285 >> Total train batch size (w. parallel, distributed & accumulation) = 32 [INFO|trainer.py:1161] 2021-06-16 16:49:49,285 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1162] 2021-06-16 16:49:49,286 >> Total optimization steps = 3750 0% 0/3750 [00:00<?, ?it/s]Traceback (most recent call last): File "transformers/examples/research_projects/zero-shot-distillation/distill_classifier.py", line 338, inExpected behavior
Not throw a
KeyError