huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
129.83k stars 25.79k forks source link

GroundingDino - Loss calculation exceptions #31434

Open Nitaym opened 1 month ago

Nitaym commented 1 month ago

System Info

transformers==4.40.2 Python 3.10.14 Ubuntu WSL under Windows 10

Who can help?

@amyeroberts

Information

Tasks

Reproduction

I've been trying to fine tune GroundingDino with transformers' GroundingDinoForObjectDetection. To ease things I've been using batch_size = 1. (I haven't tried with any other batch sizes)

When running the model, I got this exception:

Exception has occurred: RuntimeError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
split_with_sizes expects split_sizes to sum exactly to 2700 (input tensor's size at dimension -1), but got split_sizes=[3]
  File "/home/nitay/.local/lib/python3.10/site-packages/torch/_tensor.py", line 921, in split
    return torch._VF.split_with_sizes(self, split_size, dim)
  File "/home/nitay/.local/lib/python3.10/site-packages/transformers/models/grounding_dino/modeling_grounding_dino.py", line 2723, in forward
    indices = [linear_sum_assignment(c[i]) for i, c in enumerate(cost_matrix.split(sizes, -1))]
  File "/home/nitay/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/nitay/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nitay/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/nitay/.local/lib/python3.10/site-packages/transformers/models/grounding_dino/modeling_grounding_dino.py", line 2866, in forward
    indices = self.matcher(outputs_without_aux, targets)
  File "/home/nitay/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nitay/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/nitay/.local/lib/python3.10/site-packages/transformers/models/grounding_dino/modeling_grounding_dino.py", line 3091, in forward
    loss_dict = criterion(outputs_loss, labels)
  File "/home/nitay/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/nitay/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/mnt/folder/main.py", line 84, in train
    outputs = model(input_ids=input_ids, pixel_values=pixel_values, pixel_mask=pixel_mask, labels=labels)
  File "/mnt/folder/main.py", line 98, in <module>
    train()
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame)
    return _run_code(code, main_globals, None,
RuntimeError: split_with_sizes expects split_sizes to sum exactly to 2700 (input tensor's size at dimension -1), but got split_sizes=[3]

(There were indeed 3 bounding boxes in the label data)

Expected behavior

Loss should be calculated with no errors

Nitaym commented 1 month ago

Hey @amyeroberts, are you the relevant person for this bug?

I have further questions, if possible:

  1. Regarding labels - What should the "class_labels" tensor be filled in? Where should I get the right class indices from? Since this is an open-set detection model, I assume there's not a simple class index dictionary.

  2. Is there example code somewhere for fine-tuning this GroundingDino model with huggingface / custom datasets?

Thanks! Nitay

NielsRogge commented 1 month ago

cc @EduardoPach

EduardoPach commented 1 month ago

Hey @amyeroberts, are you the relevant person for this bug?

I have further questions, if possible:

  1. Regarding labels - What should the "class_labels" tensor be filled in? Where should I get the right class indices from? Since this is an open-set detection model, I assume there's not a simple class index dictionary.

  2. Is there example code somewhere for fine-tuning this GroundingDino model with huggingface / custom datasets?

Thanks!

Nitay

TL;DR

I will work to fix this during this week :)

Hey, thanks for the opening the issue! The implementation of GroundingDinoLoss is not actually correct and when adding the model I didn't focused that much on making it right as the original repo doesn't have training code or the loss calculation.

That being said I found an issue in the original repo where authors point to other repos that implement the training for Grounding DINO so I will use that and check with the paper to fix this :)

Nitaym commented 1 month ago

Thanks @EduardoPach!

I'll be happy to assist as needed. Could you point me to the reference implementations you've mentioned?

zappy586 commented 3 weeks ago

Any update @EduardoPach?

EduardoPach commented 3 weeks ago

Any update @EduardoPach?

I have added the corrections (haven't created the PR yet) I just need to test them know. I will probably do that during the weekend