huggingface / transformers

๐Ÿค— Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.15k stars 27.05k forks source link

Detr models crashes when changing the num_queries parameter in the config #28865

Open erickrf opened 9 months ago

erickrf commented 9 months ago

System Info

Who can help?

@amyeroberts

Information

Tasks

Reproduction

  1. Load the model with a custom num_queries hyperparameter.
    id2label = {0: 'Test'}
    label2id = {'Test': 0}
    model_name = "facebook/detr-resnet-50"
    image_processor = AutoImageProcessor.from_pretrained(model_name)
    detr = DetrForObjectDetection.from_pretrained(
        model_name,
        id2label=id2label,
        label2id=label2id,
        ignore_mismatched_sizes=True,
        num_queries=5
    )
  2. Train (or just run the forward pass with an input containing labels)

I got the following error

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ in <module>:1                                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ โฑ 1 trainer.train()                                                                              โ”‚
โ”‚   2                                                                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:1537 in    โ”‚
โ”‚ train                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1534 โ”‚   โ”‚   โ”‚   finally:                                                                      โ”‚
โ”‚   1535 โ”‚   โ”‚   โ”‚   โ”‚   hf_hub_utils.enable_progress_bars()                                       โ”‚
โ”‚   1536 โ”‚   โ”‚   else:                                                                             โ”‚
โ”‚ โฑ 1537 โ”‚   โ”‚   โ”‚   return inner_training_loop(                                                   โ”‚
โ”‚   1538 โ”‚   โ”‚   โ”‚   โ”‚   args=args,                                                                โ”‚
โ”‚   1539 โ”‚   โ”‚   โ”‚   โ”‚   resume_from_checkpoint=resume_from_checkpoint,                            โ”‚
โ”‚   1540 โ”‚   โ”‚   โ”‚   โ”‚   trial=trial,                                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:1854 in    โ”‚
โ”‚ _inner_training_loop                                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1851 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   self.control = self.callback_handler.on_step_begin(args, self.state,  โ”‚
โ”‚   1852 โ”‚   โ”‚   โ”‚   โ”‚                                                                             โ”‚
โ”‚   1853 โ”‚   โ”‚   โ”‚   โ”‚   with self.accelerator.accumulate(model):                                  โ”‚
โ”‚ โฑ 1854 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   tr_loss_step = self.training_step(model, inputs)                      โ”‚
โ”‚   1855 โ”‚   โ”‚   โ”‚   โ”‚                                                                             โ”‚
โ”‚   1856 โ”‚   โ”‚   โ”‚   โ”‚   if (                                                                      โ”‚
โ”‚   1857 โ”‚   โ”‚   โ”‚   โ”‚   โ”‚   args.logging_nan_inf_filter                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:2735 in    โ”‚
โ”‚ training_step                                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2732 โ”‚   โ”‚   โ”‚   return loss_mb.reduce_mean().detach().to(self.args.device)                    โ”‚
โ”‚   2733 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   2734 โ”‚   โ”‚   with self.compute_loss_context_manager():                                         โ”‚
โ”‚ โฑ 2735 โ”‚   โ”‚   โ”‚   loss = self.compute_loss(model, inputs)                                       โ”‚
โ”‚   2736 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   2737 โ”‚   โ”‚   if self.args.n_gpu > 1:                                                           โ”‚
โ”‚   2738 โ”‚   โ”‚   โ”‚   loss = loss.mean()  # mean() to average on multi-gpu parallel training        โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:2758 in    โ”‚
โ”‚ compute_loss                                                                                     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2755 โ”‚   โ”‚   โ”‚   labels = inputs.pop("labels")                                                 โ”‚
โ”‚   2756 โ”‚   โ”‚   else:                                                                             โ”‚
โ”‚   2757 โ”‚   โ”‚   โ”‚   labels = None                                                                 โ”‚
โ”‚ โฑ 2758 โ”‚   โ”‚   outputs = model(**inputs)                                                         โ”‚
โ”‚   2759 โ”‚   โ”‚   # Save past state if it exists                                                    โ”‚
โ”‚   2760 โ”‚   โ”‚   # TODO: this needs to be fixed and made cleaner later.                            โ”‚
โ”‚   2761 โ”‚   โ”‚   if self.args.past_index >= 0:                                                     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518    โ”‚
โ”‚ in _wrapped_call_impl                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1515 โ”‚   โ”‚   if self._compiled_call_impl is not None:                                          โ”‚
โ”‚   1516 โ”‚   โ”‚   โ”‚   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        โ”‚
โ”‚   1517 โ”‚   โ”‚   else:                                                                             โ”‚
โ”‚ โฑ 1518 โ”‚   โ”‚   โ”‚   return self._call_impl(*args, **kwargs)                                       โ”‚
โ”‚   1519 โ”‚                                                                                         โ”‚
โ”‚   1520 โ”‚   def _call_impl(self, *args, **kwargs):                                                โ”‚
โ”‚   1521 โ”‚   โ”‚   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527    โ”‚
โ”‚ in _call_impl                                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1524 โ”‚   โ”‚   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   โ”‚
โ”‚   1525 โ”‚   โ”‚   โ”‚   โ”‚   or _global_backward_pre_hooks or _global_backward_hooks                   โ”‚
โ”‚   1526 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):                   โ”‚
โ”‚ โฑ 1527 โ”‚   โ”‚   โ”‚   return forward_call(*args, **kwargs)                                          โ”‚
โ”‚   1528 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   1529 โ”‚   โ”‚   try:                                                                              โ”‚
โ”‚   1530 โ”‚   โ”‚   โ”‚   result = None                                                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling  โ”‚
โ”‚ _detr.py:1603 in forward                                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1600 โ”‚   โ”‚   โ”‚   โ”‚   auxiliary_outputs = self._set_aux_loss(outputs_class, outputs_coord)      โ”‚
โ”‚   1601 โ”‚   โ”‚   โ”‚   โ”‚   outputs_loss["auxiliary_outputs"] = auxiliary_outputs                     โ”‚
โ”‚   1602 โ”‚   โ”‚   โ”‚                                                                                 โ”‚
โ”‚ โฑ 1603 โ”‚   โ”‚   โ”‚   loss_dict = criterion(outputs_loss, labels)                                   โ”‚
โ”‚   1604 โ”‚   โ”‚   โ”‚   # Fourth: compute total loss, as a weighted sum of the various losses         โ”‚
โ”‚   1605 โ”‚   โ”‚   โ”‚   weight_dict = {"loss_ce": 1, "loss_bbox": self.config.bbox_loss_coefficient}  โ”‚
โ”‚   1606 โ”‚   โ”‚   โ”‚   weight_dict["loss_giou"] = self.config.giou_loss_coefficient                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518    โ”‚
โ”‚ in _wrapped_call_impl                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1515 โ”‚   โ”‚   if self._compiled_call_impl is not None:                                          โ”‚
โ”‚   1516 โ”‚   โ”‚   โ”‚   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        โ”‚
โ”‚   1517 โ”‚   โ”‚   else:                                                                             โ”‚
โ”‚ โฑ 1518 โ”‚   โ”‚   โ”‚   return self._call_impl(*args, **kwargs)                                       โ”‚
โ”‚   1519 โ”‚                                                                                         โ”‚
โ”‚   1520 โ”‚   def _call_impl(self, *args, **kwargs):                                                โ”‚
โ”‚   1521 โ”‚   โ”‚   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527    โ”‚
โ”‚ in _call_impl                                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1524 โ”‚   โ”‚   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   โ”‚
โ”‚   1525 โ”‚   โ”‚   โ”‚   โ”‚   or _global_backward_pre_hooks or _global_backward_hooks                   โ”‚
โ”‚   1526 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):                   โ”‚
โ”‚ โฑ 1527 โ”‚   โ”‚   โ”‚   return forward_call(*args, **kwargs)                                          โ”‚
โ”‚   1528 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   1529 โ”‚   โ”‚   try:                                                                              โ”‚
โ”‚   1530 โ”‚   โ”‚   โ”‚   result = None                                                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling  โ”‚
โ”‚ _detr.py:2202 in forward                                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2199 โ”‚   โ”‚   outputs_without_aux = {k: v for k, v in outputs.items() if k != "auxiliary_outpu  โ”‚
โ”‚   2200 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   2201 โ”‚   โ”‚   # Retrieve the matching between the outputs of the last layer and the targets     โ”‚
โ”‚ โฑ 2202 โ”‚   โ”‚   indices = self.matcher(outputs_without_aux, targets)                              โ”‚
โ”‚   2203 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   2204 โ”‚   โ”‚   # Compute the average number of target boxes across all nodes, for normalization  โ”‚
โ”‚   2205 โ”‚   โ”‚   num_boxes = sum(len(t["class_labels"]) for t in targets)                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518    โ”‚
โ”‚ in _wrapped_call_impl                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1515 โ”‚   โ”‚   if self._compiled_call_impl is not None:                                          โ”‚
โ”‚   1516 โ”‚   โ”‚   โ”‚   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        โ”‚
โ”‚   1517 โ”‚   โ”‚   else:                                                                             โ”‚
โ”‚ โฑ 1518 โ”‚   โ”‚   โ”‚   return self._call_impl(*args, **kwargs)                                       โ”‚
โ”‚   1519 โ”‚                                                                                         โ”‚
โ”‚   1520 โ”‚   def _call_impl(self, *args, **kwargs):                                                โ”‚
โ”‚   1521 โ”‚   โ”‚   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527    โ”‚
โ”‚ in _call_impl                                                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   1524 โ”‚   โ”‚   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   โ”‚
โ”‚   1525 โ”‚   โ”‚   โ”‚   โ”‚   or _global_backward_pre_hooks or _global_backward_hooks                   โ”‚
โ”‚   1526 โ”‚   โ”‚   โ”‚   โ”‚   or _global_forward_hooks or _global_forward_pre_hooks):                   โ”‚
โ”‚ โฑ 1527 โ”‚   โ”‚   โ”‚   return forward_call(*args, **kwargs)                                          โ”‚
โ”‚   1528 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   1529 โ”‚   โ”‚   try:                                                                              โ”‚
โ”‚   1530 โ”‚   โ”‚   โ”‚   result = None                                                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py:115 in  โ”‚
โ”‚ decorate_context                                                                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   112 โ”‚   @functools.wraps(func)                                                                 โ”‚
โ”‚   113 โ”‚   def decorate_context(*args, **kwargs):                                                 โ”‚
โ”‚   114 โ”‚   โ”‚   with ctx_factory():                                                                โ”‚
โ”‚ โฑ 115 โ”‚   โ”‚   โ”‚   return func(*args, **kwargs)                                                   โ”‚
โ”‚   116 โ”‚                                                                                          โ”‚
โ”‚   117 โ”‚   return decorate_context                                                                โ”‚
โ”‚   118                                                                                            โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling  โ”‚
โ”‚ _detr.py:2323 in forward                                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2320 โ”‚   โ”‚   bbox_cost = torch.cdist(out_bbox, target_bbox, p=1)                               โ”‚
โ”‚   2321 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   2322 โ”‚   โ”‚   # Compute the giou cost between boxes                                             โ”‚
โ”‚ โฑ 2323 โ”‚   โ”‚   giou_cost = -generalized_box_iou(center_to_corners_format(out_bbox), center_to_c  โ”‚
โ”‚   2324 โ”‚   โ”‚                                                                                     โ”‚
โ”‚   2325 โ”‚   โ”‚   # Final cost matrix                                                               โ”‚
โ”‚   2326 โ”‚   โ”‚   cost_matrix = self.bbox_cost * bbox_cost + self.class_cost * class_cost + self.g  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling  โ”‚
โ”‚ _detr.py:2388 in generalized_box_iou                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   2385 โ”‚   # degenerate boxes gives inf / nan results                                            โ”‚
โ”‚   2386 โ”‚   # so do an early check                                                                โ”‚
โ”‚   2387 โ”‚   if not (boxes1[:, 2:] >= boxes1[:, :2]).all():                                        โ”‚
โ”‚ โฑ 2388 โ”‚   โ”‚   raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {  โ”‚
โ”‚   2389 โ”‚   if not (boxes2[:, 2:] >= boxes2[:, :2]).all():                                        โ”‚
โ”‚   2390 โ”‚   โ”‚   raise ValueError(f"boxes2 must be in [x0, y0, x1, y1] (corner) format, but got {  โ”‚
โ”‚   2391 โ”‚   iou, union = box_iou(boxes1, boxes2)                                                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan]], device='cuda:0')

The same code works fine without changing the default num_queries.

Expected behavior

I would expect the model to run as normal.

I am fine tuning the model in a custom dataset which should not have more than a couple of objects per image, and expected the number of queries to have no impact other than limiting the maximum number of objects found.

amyeroberts commented 9 months ago

Hi @erickrf, thanks for raising this issue!

Could you provide some more information about the crashing behaviour? Specifically, are you seeing any error messages, or is the processor just killed?

Could you provide a minimal code snippet we can run to reproduce the error e.g. with a sample of data being passed to the model with e.g. a public dataset?

erickrf commented 9 months ago

Sure! I basically get the error mentioned above.

This snippet can replicate the problem (it's rather long but from the tutorial on object detection):

from transformers import DetrImageProcessor, DetrForObjectDetection, TrainingArguments, Trainer
from datasets import load_dataset
import numpy as np

cppe5 = load_dataset("cppe-5")
categories = cppe5['train'].features['objects'].feature['category'].names

id2label = {index: x for index, x in enumerate(categories, start=0)}
label2id = {v: k for k, v in id2label.items()}

model_name = "facebook/detr-resnet-50"
image_processor = DetrImageProcessor.from_pretrained(model_name)
detr = DetrForObjectDetection.from_pretrained(
    model_name,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
    num_queries=5
)

def formatted_anns(image_id, category, area, bbox):
    annotations = []

    for i in range(0, len(category)):
        new_ann = {
            "image_id": image_id,
            "category_id": category[i],
            "isCrowd": 0,
            "area": area[i],
            "bbox": list(bbox[i]),
        }
        annotations.append(new_ann)

    return annotations

def transform_aug_ann(examples):
    image_ids = examples["image_id"]
    images, bboxes, area, categories = [], [], [], []

    for image, objects in zip(examples["image"], examples["objects"]):
        image = np.array(image.convert("RGB"))[:, :, ::-1]

        area.append(objects["area"])
        images.append(image)
        bboxes.append(objects["bbox"])
        categories.append(objects["category"])

    targets = [
        {"image_id": id_, "annotations": formatted_anns(id_, cat_, ar_, box_)}
        for id_, cat_, ar_, box_ in zip(image_ids, categories, area, bboxes)
    ]

    return image_processor(images=images, annotations=targets, return_tensors="pt")

def collate_fn(batch):
    pixel_values = [item["pixel_values"] for item in batch]
    encoding = image_processor.pad(pixel_values, return_tensors="pt")
    labels = [item["labels"] for item in batch]
    batch = {}
    batch["pixel_values"] = encoding["pixel_values"]
    batch["pixel_mask"] = encoding["pixel_mask"]
    batch["labels"] = labels
    return batch

cppe5["train"] = cppe5["train"].with_transform(transform_aug_ann)

training_args = TrainingArguments(
    output_dir="model/tests",
    per_device_train_batch_size=4,
    num_train_epochs=10,
    fp16=False,
    save_steps=200,
    logging_steps=200,
    learning_rate=1e-5,
    weight_decay=1e-4,
    save_total_limit=1,
    remove_unused_columns=False,
)
trainer = Trainer(
    model=detr,
    args=training_args,
    data_collator=collate_fn,
    train_dataset=cppe5["train"],
    tokenizer=image_processor,
)
trainer.train()
Isalia20 commented 9 months ago

I have encountered this problem as well. When trying to change num_queries parameter it sometimes gives NAs and even when it runs it is unable to train. To try it out and test everything before I ran it on the whole dataset, I tried to overfit on a single image(just giving it the same image and targets on each run) but it couldn't do it in 5000 steps. Num_queries=100 worked like a charm both when starting from pretrained or without pretrained(again overfitting on a single image).

Isalia20 commented 9 months ago

Also I found out that using a smaller learning rate fixed the Nan issue

erickrf commented 9 months ago

I have looked a bit more attentively into the original DETR paper, and it says (Section 3.1):

DETR infers a fixed-size set of N predictions, in a single pass through the decoder, where N is set to be significantly larger than the typical number of objects in an image.

I couldn't find any analysis of the impact of this number N, but now I see that lowering it so much is expected to hurt the model.

Still, I would expect rather a bad performance than outright nan values.

Isalia20 commented 8 months ago

I've looked into this quiet deeply, training with different num_queries parameters from scratch, from finetuned version etc. and found that copying the weights of num_queries is useful when num_queries is being initialized with <100 queries. So for example if it is initialized with num_queries=50, copying first 50 queries helps with training and doesn't produce nans.

@amyeroberts I can submit a PR if possible for this change(when num_queries initialized with <100 queries, to copy the first n weights). It greatly speeds up training from what I have tried.

amyeroberts commented 8 months ago

Hi @Isalia20, thanks for digging into the behaviour of num_queries and training!

I don't think this is something we want to add in on the transformers side. The reason being that it breaks with convention of how weights are normally loaded with our models: a change in config value which causes a change in shape results in a new weight being initialized. Changing this would change assumptions about the model loading behaviour in the library.

It sounds very useful however, please feel free to share the code or a link to an example here for the community.

Isalia20 commented 8 months ago

I'm currently finetuning on SKU110K dataset with 400 num_queries. Once training is finished, I'll upload the model/code to HF/Github. Should I share the instructions here or is there some place else better to share?

amyeroberts commented 8 months ago

@Isalia20 Whereever you think is best. I'd suggest sharing here, or linking to a relevant blog / repo with example code. Another great place would be on the forums.

Isalia20 commented 8 months ago

I've released the model here: https://huggingface.co/isalia99/detr-resnet-50-sku110k and code is here: https://github.com/Isalia20/DETR-finetune

rhajou commented 7 months ago

Hello @Isalia20 , @amyeroberts

I am facing a similar issue but with the main difference is that the output of the model is not Nan but it does not respect the x1,y1,x2,y2 format.

Let me add this link to a similar issue found by another user on Huggingface discussions here

Is the same solution convenient to resolve the issue?

I am trying to increase the learning rate to accelerate training. I have the following specifics:

In your opinion:

Isalia20 commented 7 months ago

AFAIK, the model requires x_center, y_center, width, height (in relative coordinates to image) to train

rhajou commented 7 months ago

@Isalia20 But the error mentioned in this issue is mainly due to the the bboxes1 (the output of the model) and not to the bboxes2 (the target bboxes)

[not related to this issue ] in this case, the notebook of Niels found here is missing a step to convert the input from x1,y1,w,h to x_center,y_center, width, height?

Isalia20 commented 7 months ago

Nevermind, it's actually x1, y1, w, h in relative coords and that notebook does have it correctly. My best advice would be to train with already pretrained num_queries=100 and have a small learning rate(1e-5 for the head and freeze backbone). In that case Nan issues didn't occur for me. If they still occur maybe sharing your code will help us to debug it(if possible)

rhajou commented 7 months ago

@Isalia20

I am facing the exact same issue that mentioned here. You can find below the error I have been having. After further investigation, high learning rate can reveal this type of error,. I will stick for the time being for an error of 1.e-4, without a lot of warmup.

On the other hand, regarding the boxes input/output format, it is worth noting that the input of the model is also of cx,cy,w,h format. In HF notebook, the conversion is done when transforming in this line of code

transform = albumentations.Compose(
    [
        albumentations.Resize(480, 480),
        albumentations.HorizontalFlip(p=1.0),
        albumentations.RandomBrightnessContrast(p=1.0),
    ],
    bbox_params=albumentations.BboxParams(format="coco", label_fields=["category"]), # here the bbox are converted from x,y,wh to cx,cy,w,h
)
ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[0.0067, 0.1018, 0.1296, 0.8076],
        [0.3481, 0.0247, 0.7026, 0.2710],
        [0.0161, 0.2329, 0.3252, 0.9087],
        ...,
        [0.2112, 0.0206, 0.9541, 0.1913],
        [0.3584, 0.0234, 0.9580, **1.0029**],
        [0.3655, 0.0252, 0.8555, 0.2568]], device='cuda:0',
       dtype=torch.float16)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File <command--1>:8
      5 del sys
      7 with open(filename, "rb") as f:
----> 8   exec(compile(f.read(), filename, 'exec'))

    417 if (
    418     active_session_failed
    419     or autologging_is_disabled(autologging_integration)
   (...)
    426     # warning behavior during original function execution, since autologging is being
    427     # skipped
    428     with set_non_mlflow_warnings_behavior_for_current_thread(
    429         disable_warnings=False,
    430         reroute_warnings=False,
    431     ):
--> 432         return original(*args, **kwargs)
    434 # Whether or not the original / underlying function has been called during the
    435 # execution of patched code
    436 original_has_been_called = False

File /databricks/python/lib/python3.9/site-packages/transformers/trainer.py:1555, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1553         hf_hub_utils.enable_progress_bars()
   1554 else:
-> 1555     return inner_training_loop(
   1556         args=args,
   1557         resume_from_checkpoint=resume_from_checkpoint,
   1558         trial=trial,
   1559         ignore_keys_for_eval=ignore_keys_for_eval,
   1560     )

File /databricks/python/lib/python3.9/site-packages/transformers/trainer.py:1860, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1857     self.control = self.callback_handler.on_step_begin(args, self.state, self.control)
   1859 with self.accelerator.accumulate(model):
-> 1860     tr_loss_step = self.training_step(model, inputs)
   1862 if (
   1863     args.logging_nan_inf_filter
   1864     and not is_torch_tpu_available()
   1865     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))
   1866 ):
   1867     # if loss is nan or inf simply add the average of previous logged losses
   1868     tr_loss += tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)

File /databricks/python/lib/python3.9/site-packages/transformers/trainer.py:2725, in Trainer.training_step(self, model, inputs)
   2722     return loss_mb.reduce_mean().detach().to(self.args.device)
   2724 with self.compute_loss_context_manager():
-> 2725     loss = self.compute_loss(model, inputs)
   2727 if self.args.n_gpu > 1:
   2728     loss = loss.mean()  # mean() to average on multi-gpu parallel training

File /databricks/python/lib/python3.9/site-packages/transformers/trainer.py:2748, in Trainer.compute_loss(self, model, inputs, return_outputs)
   2746 else:
   2747     labels = None
-> 2748 outputs = model(**inputs)
   2749 # Save past state if it exists
   2750 # TODO: this needs to be fixed and made cleaner later.
   2751 if self.args.past_index >= 0:

File /databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File /databricks/python/lib/python3.9/site-packages/accelerate/utils/operations.py:687, in convert_outputs_to_fp32.<locals>.forward(*args, **kwargs)
    686 def forward(*args, **kwargs):
--> 687     return model_forward(*args, **kwargs)

File /databricks/python/lib/python3.9/site-packages/accelerate/utils/operations.py:675, in ConvertOutputsToFp32.__call__(self, *args, **kwargs)
    674 def __call__(self, *args, **kwargs):
--> 675     return convert_to_fp32(self.model_forward(*args, **kwargs))

File /databricks/python/lib/python3.9/site-packages/torch/amp/autocast_mode.py:14, in autocast_decorator.<locals>.decorate_autocast(*args, **kwargs)
     11 @functools.wraps(func)
     12 def decorate_autocast(*args, **kwargs):
     13     with autocast_instance:
---> 14         return func(*args, **kwargs)

....
     47 def forward(
     48     self,
     49     pixel_values: torch.FloatTensor,
   (...)
     59     format_labels_val=None,
     60 ):
---> 62     output = super().forward(
     63         pixel_values=pixel_values,
     64         pixel_mask=pixel_mask,
     65         decoder_attention_mask=decoder_attention_mask,
     66         encoder_outputs=encoder_outputs,
     67         inputs_embeds=inputs_embeds,
     68         decoder_inputs_embeds=decoder_inputs_embeds,
     69         labels=labels,
     70         output_attentions=output_attentions,
     71         output_hidden_states=output_hidden_states,
     72         return_dict=return_dict,
     73     )
     75     return CustomDetrObjectDetectionOutput(
     76         **output.__dict__, format_labels_val=format_labels_val
     77     )

File /databricks/python/lib/python3.9/site-packages/transformers/models/detr/modeling_detr.py:1603, in DetrForObjectDetection.forward(self, pixel_values, pixel_mask, decoder_attention_mask, encoder_outputs, inputs_embeds, decoder_inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1600     auxiliary_outputs = self._set_aux_loss(outputs_class, outputs_coord)
   1601     outputs_loss["auxiliary_outputs"] = auxiliary_outputs
-> 1603 loss_dict = criterion(outputs_loss, labels)
   1604 # Fourth: compute total loss, as a weighted sum of the various losses
   1605 weight_dict = {"loss_ce": 1, "loss_bbox": self.config.bbox_loss_coefficient}

File /databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File /databricks/python/lib/python3.9/site-packages/transformers/models/detr/modeling_detr.py:2202, in DetrLoss.forward(self, outputs, targets)
   2199 outputs_without_aux = {k: v for k, v in outputs.items() if k != "auxiliary_outputs"}
   2201 # Retrieve the matching between the outputs of the last layer and the targets
-> 2202 indices = self.matcher(outputs_without_aux, targets)
   2204 # Compute the average number of target boxes across all nodes, for normalization purposes
   2205 num_boxes = sum(len(t["class_labels"]) for t in targets)

File /databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File /databricks/python/lib/python3.9/site-packages/torch/autograd/grad_mode.py:27, in _DecoratorContextManager.__call__.<locals>.decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File /databricks/python/lib/python3.9/site-packages/transformers/models/detr/modeling_detr.py:2323, in DetrHungarianMatcher.forward(self, outputs, targets)
   2320 bbox_cost = torch.cdist(out_bbox, target_bbox, p=1)
   2322 # Compute the giou cost between boxes
-> 2323 giou_cost = -generalized_box_iou(center_to_corners_format(out_bbox), center_to_corners_format(target_bbox))
   2325 # Final cost matrix
   2326 cost_matrix = self.bbox_cost * bbox_cost + self.class_cost * class_cost + self.giou_cost * giou_cost

File /databricks/python/lib/python3.9/site-packages/transformers/models/detr/modeling_detr.py:2388, in generalized_box_iou(boxes1, boxes2)
   2385 # degenerate boxes gives inf / nan results
   2386 # so do an early check
   2387 if not (boxes1[:, 2:] >= boxes1[:, :2]).all():
-> 2388     raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {boxes1}")
   2389 if not (boxes2[:, 2:] >= boxes2[:, :2]).all():
   2390     raise ValueError(f"boxes2 must be in [x0, y0, x1, y1] (corner) format, but got {boxes2}")

ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[0.0067, 0.1018, 0.1296, 0.8076],
        [0.3481, 0.0247, 0.7026, 0.2710],
        [0.0161, 0.2329, 0.3252, 0.9087],
        ...,
        [0.2112, 0.0206, 0.9541, 0.1913],
        [0.3584, 0.0234, 0.9580, **1.0029**],
        [0.3655, 0.0252, 0.8555, 0.2568]], device='cuda:0',
       dtype=torch.float16)