huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.69k stars 26.22k forks source link

Setting compute_metrics in Trainer with Idefics2ForConditionalGeneration leads to AttributeError: 'DynamicCache' object has no attribute 'detach' #30631

Closed EloiEynard closed 3 months ago

EloiEynard commented 4 months ago

System Info

Who can help?

Not sure if this is an issue with the Trainer or the model.

Information

Tasks

Reproduction

The following code is from the Idefics2 fine-tuning example colab with the addition of the compute_metrics in the Trainer.

!pip install -q git+https://github.com/huggingface/transformers.git
!pip install -q accelerate datasets peft bitsandbytes

import torch
from peft import LoraConfig
from transformers import AutoProcessor, BitsAndBytesConfig, Idefics2ForConditionalGeneration

DEVICE = "cuda:0"
USE_LORA = False
USE_QLORA = True

processor = AutoProcessor.from_pretrained(
    "HuggingFaceM4/idefics2-8b",
    do_image_splitting=False
)

# Three options for training, from the lowest precision training to the highest precision training:
# - QLora
# - Standard Lora
# - Full fine-tuning
if USE_QLORA or USE_LORA:
    lora_config = LoraConfig(
        r=8,
        lora_alpha=8,
        lora_dropout=0.1,
        target_modules='.*(text_model|modality_projection|perceiver_resampler).*(down_proj|gate_proj|up_proj|k_proj|q_proj|v_proj|o_proj).*$',
        use_dora=False if USE_QLORA else True,
        init_lora_weights="gaussian"
    )
    if USE_QLORA:
        bnb_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16
        )
    model = Idefics2ForConditionalGeneration.from_pretrained(
        "HuggingFaceM4/idefics2-8b",
        torch_dtype=torch.float16,
        quantization_config=bnb_config if USE_QLORA else None,
    )
    model.add_adapter(lora_config)
    model.enable_adapters()
else:
    model = Idefics2ForConditionalGeneration.from_pretrained(
        "HuggingFaceM4/idefics2-8b",
        torch_dtype=torch.float16,
        _attn_implementation="flash_attention_2", # Only available on A100 or H100
    ).to(DEVICE)

from datasets import load_dataset

train_dataset = load_dataset("nielsr/docvqa_1200_examples", split="train")
train_dataset = train_dataset.remove_columns(['id', 'words', 'bounding_boxes', 'answer'])

eval_dataset = load_dataset("nielsr/docvqa_1200_examples", split="test")
eval_dataset = eval_dataset.remove_columns(['id', 'words', 'bounding_boxes', 'answer'])

import random

class MyDataCollator:
    def __init__(self, processor):
        self.processor = processor
        self.image_token_id = processor.tokenizer.additional_special_tokens_ids[
            processor.tokenizer.additional_special_tokens.index("<image>")
        ]

    def __call__(self, examples):
        texts = []
        images = []
        for example in examples:
            image = example["image"]
            question = example["query"]["en"]
            answer = random.choice(example["answers"])
            messages = [
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": "Answer briefly."},
                        {"type": "image"},
                        {"type": "text", "text": question}
                    ]
                },
                {
                    "role": "assistant",
                    "content": [
                        {"type": "text", "text": answer}
                    ]
                }
            ]
            text = processor.apply_chat_template(messages, add_generation_prompt=False)
            texts.append(text.strip())
            images.append([image])

        batch = processor(text=texts, images=images, return_tensors="pt", padding=True)

        labels = batch["input_ids"].clone()
        labels[labels == processor.tokenizer.pad_token_id] = self.image_token_id
        batch["labels"] = labels

        return batch

data_collator = MyDataCollator(processor)

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    num_train_epochs=2,
    per_device_train_batch_size=2,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=8,
    warmup_steps=50,
    learning_rate=1e-4,
    weight_decay=0.01,
    logging_steps=25,
    output_dir="/content/drive/My Drive/docvqa_ft_tutorial",
    save_strategy="steps",
    save_steps=250,
    save_total_limit=1,
    # evaluation_strategy="epoch",
    fp16=True,
    push_to_hub_model_id="idefics2-8b-docvqa-finetuned-tutorial",
    remove_unused_columns=False,
    report_to="none",
)

def custom_metrics(eval, preds):
    exit(0)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics = custom_metrics,
)

trainer.evaluate()

Here is the exception :

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/template.ipynb Cell 36 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/template.ipynb#X50sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3510 start_time = time.time()
   3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
   3514     eval_dataloader,
   3515     description="Evaluation",
   3516     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3517     # self.args.prediction_loss_only
   3518     prediction_loss_only=True if self.compute_metrics is None else None,
   3519     ignore_keys=ignore_keys,
   3520     metric_key_prefix=metric_key_prefix,
   3521 )
   3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3696](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3696), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   3693         batch_size = observed_batch_size
   3695 # Prediction step
-> 3696 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
   3697 main_input_name = getattr(self.model, "main_input_name", "input_ids")
   3698 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3904](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3904), in Trainer.prediction_step(self, model, inputs, prediction_loss_only, ignore_keys)
   3902     return (loss, None, None)
   3903 print(logits) #Eloi Remove
-> 3904 logits = nested_detach(logits)
   3905 if len(logits) == 1:
   3906     logits = logits[0]

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190), in nested_detach(tensors)
    188 "Detach `tensors` (even if it's a nested list/tuple/dict of tensors)."
    189 if isinstance(tensors, (list, tuple)):
--> 190     return type(tensors)(nested_detach(t) for t in tensors)
    191 elif isinstance(tensors, Mapping):
    192     return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:190), in <genexpr>(.0)
    188 "Detach `tensors` (even if it's a nested list/tuple/dict of tensors)."
    189 if isinstance(tensors, (list, tuple)):
--> 190     return type(tensors)(nested_detach(t) for t in tensors)
    191 elif isinstance(tensors, Mapping):
    192     return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:193](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:193), in nested_detach(tensors)
    191 elif isinstance(tensors, Mapping):
    192     return type(tensors)({k: nested_detach(t) for k, t in tensors.items()})
--> 193 return tensors.detach()

AttributeError: 'DynamicCache' object has no attribute 'detach'

Seems to happend when the model's output's past_key_values are an empty DynamicCache.

Expected behavior

Should properly reach the custom_metrics and terminate cleanly.

NielsRogge commented 4 months ago

I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option

EloiEynard commented 4 months ago

I had the same error and fixed it by using model.config.use_cache=False during training

That fixes this issue as the past_key_values are now full tensors. But leads to a new error :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb Cell 9 line [1](vscode-notebook-cell://wsl%2Bubuntu/home/eyel/pm-ia-traitement-documents/src/python/notebooks/Idefics2_Fine_tuning_example.ipynb#X43sdnNjb2RlLXJlbW90ZQ%3D%3D?line=0)
----> 1 trainer.evaluate()

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3513), in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3510 start_time = time.time()
   3512 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3513 output = eval_loop(
   3514     eval_dataloader,
   3515     description="Evaluation",
   3516     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3517     # self.args.prediction_loss_only
   3518     prediction_loss_only=True if self.compute_metrics is None else None,
   3519     ignore_keys=ignore_keys,
   3520     metric_key_prefix=metric_key_prefix,
   3521 )
   3523 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3524 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer.py:3716), in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   3714         logits = self.preprocess_logits_for_metrics(logits, labels)
   3715     logits = self.gather_function((logits))
-> 3716     all_preds.add(logits)
   3717 if labels is not None:
   3718     labels = self.accelerator.pad_across_processes(labels, dim=1, pad_index=-100)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:326), in EvalLoopContainer.add(self, tensors)
    324     self.tensors = tensors if self.do_nested_concat else [tensors]
    325 elif self.do_nested_concat:
--> 326     self.tensors = nested_concat(self.tensors, tensors, padding_index=self.padding_index)
    327 else:
    328     self.tensors.append(tensors)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in nested_concat(tensors, new_tensors, padding_index)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:138), in <genexpr>(.0)
    134 assert type(tensors) == type(
    135     new_tensors
    136 ), f"Expected `tensors` and `new_tensors` to have the same type but found {type(tensors)} and {type(new_tensors)}."
    137 if isinstance(tensors, (list, tuple)):
--> 138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
    140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:140), in nested_concat(tensors, new_tensors, padding_index)
    138     return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
    139 elif isinstance(tensors, torch.Tensor):
--> 140     return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
    141 elif isinstance(tensors, Mapping):
    142     return type(tensors)(
    143         {k: nested_concat(t, new_tensors[k], padding_index=padding_index) for k, t in tensors.items()}
    144     )

File [~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/eyel/pm-ia-traitement-documents/src/python/notebooks/~/miniconda3/lib/python3.11/site-packages/transformers/trainer_pt_utils.py:99), in torch_pad_and_concatenate(tensor1, tensor2, padding_index)
     96 tensor2 = atleast_1d(tensor2)
     98 if len(tensor1.shape) == 1 or tensor1.shape[1] == tensor2.shape[1]:
---> 99     return torch.cat((tensor1, tensor2), dim=0)
    101 # Let's figure out the new shape
    102 new_shape = (tensor1.shape[0] + tensor2.shape[0], max(tensor1.shape[1], tensor2.shape[1])) + tensor1.shape[2:]

RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 119 but got size 99 for tensor number 1 in the list.
NielsRogge commented 4 months ago

Yes this is due to batches having different lengths of input_ids (in the code snippet of your first message, you set padding=True which means dynamic padding, each batch may have a different length). If your eval batch size is smaller than or equal to your training batch size, then it's fine.

It can be fixed by either padding all examples to the same length (i.e. using padding="max_length", max_length=200, truncation=True for instance), or by passing the flag eval_do_concat_batches=False to the TrainingArguments). In the latter case, you'll get a list of predictions/labels in the compute_metrics function rather than stacked tensors, so you would need to adapt your compute_metrics function accordingly.

VictorSanh commented 4 months ago

I had the same error and fixed it by using model.config.use_cache=False during training. But @VictorSanh might know a better option

I don't have a better fix!

zucchini-nlp commented 4 months ago

I think the cache problem should be fixed by converting DynamicCache back to legacy_cache in Idefics2's backbone language model, like it's already done in llama.

These changes are partially related to issue of making language models "compile" compatible, and should be available soon 🤗

amyeroberts commented 4 months ago

Thanks for the explanation @zucchini-nlp! Does this mean that this fix won't be needed soon, or that it enables something which isn't available yet but will be soon?

zucchini-nlp commented 4 months ago

We discussed this with @gante the cache input-output format yesterday. Maybe llama-format cache is not what we need, by anyway @gante will take care of it 😄

amyeroberts commented 4 months ago

@zucchini-nlp OK. The main thing to know is what, if anything, should be updated in idefics2. Is what @gante is doing addressing this?

zucchini-nlp commented 4 months ago

@amyeroberts I am not sure what should be the correct format of cache objects we return for language models since now we do not have consistency, so I wanted @gante to look at it.

There are two options for this:

  1. The language model should always return a tuple type cache (as current Llama), in which case we would have to only update Mistral to follow the same logic
  2. The language model should return the same type of cache as it received in forward. In that case Idefics2 has to add cache.to_legacy_cache() in the end by ensuring it returns a tuple type, which will be consistent with how caching works for most current language models.

Also I believe we are going to get rid of the tuple type cache sometime in the future, so cache+Trainer is something to have in mind for then

amyeroberts commented 4 months ago

@zucchini-nlp OK, great, thanks for explaining. Let's leave as-is and then once the cache format is standardized we can propogate this to idefics2 + other models.

NielsRogge commented 4 months ago

Hi @EloiEynard I just uploaded an example notebook for fine-tuning Idefics2 on an image -> JSON dataset here: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Idefics2/Fine_tune_Idefics2_for_JSON_extraction_use_cases_(PyTorch_Lightning).ipynb

EloiEynard commented 4 months ago

Thanks @NielsRogge, I got it all figured out with the Trainer and am currently finetuning with my custom eval. Wish I knew about lightning earlier though, seems more explicit.

By the way, if you don't mind me asking, I've noticed in your notebooks you use model.add_adapter(lora_config) model.enable_adapters() Where I mostly used to see model = get_peft_model(model, lora_config) Is there any difference between these two ? Thanks

NielsRogge commented 4 months ago

I had the same question, turns out both are equivalent. The get_peft_model API is recommended as it returns a PeftModel which has additionally utility methods such as save_adapter() with support for saving resized embedding layers. I tried leveraging it, but for some reason I gave me out-of-memory errors which I did not encounter with add_adapter. This could be due to PyTorch Lightning, the fact that I was using a notebook, or something else.

I'm currently looking into creating a similar notebook that leverages the Trainer API with get_peft_model. The reason I used PyTorch Lightning is because it allowed me to get up and running very quickly, especially regarding computing metrics during evaluation.

EloiEynard commented 4 months ago

I see, thanks for the details !