Open Nitaym opened 5 months ago
Hey @amyeroberts, are you the relevant person for this bug?
I have further questions, if possible:
Regarding labels - What should the "class_labels" tensor be filled in? Where should I get the right class indices from? Since this is an open-set detection model, I assume there's not a simple class index dictionary.
Is there example code somewhere for fine-tuning this GroundingDino model with huggingface / custom datasets?
Thanks! Nitay
cc @EduardoPach
Hey @amyeroberts, are you the relevant person for this bug?
I have further questions, if possible:
Regarding labels - What should the "class_labels" tensor be filled in? Where should I get the right class indices from? Since this is an open-set detection model, I assume there's not a simple class index dictionary.
Is there example code somewhere for fine-tuning this GroundingDino model with huggingface / custom datasets?
Thanks!
Nitay
TL;DR
I will work to fix this during this week :)
Hey, thanks for the opening the issue! The implementation of GroundingDinoLoss is not actually correct and when adding the model I didn't focused that much on making it right as the original repo doesn't have training code or the loss calculation.
That being said I found an issue in the original repo where authors point to other repos that implement the training for Grounding DINO so I will use that and check with the paper to fix this :)
Thanks @EduardoPach!
I'll be happy to assist as needed. Could you point me to the reference implementations you've mentioned?
Any update @EduardoPach?
Any update @EduardoPach?
I have added the corrections (haven't created the PR yet) I just need to test them know. I will probably do that during the weekend
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Waiting final approval https://github.com/huggingface/transformers/pull/31828
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
still ongoing...
Any progress on the minimum example on training? Would love to be able to finetune grounding dino on HF.
Should https://github.com/huggingface/transformers/issues/31434 be merged somehow?
@stevenwudi #31828 is very close to being merged in. You'll see on the PR there were just a few small outstanding comments to address so expect to see this available soon!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
transformers==4.40.2 Python 3.10.14 Ubuntu WSL under Windows 10
Who can help?
@amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I've been trying to fine tune GroundingDino with transformers' GroundingDinoForObjectDetection. To ease things I've been using batch_size = 1. (I haven't tried with any other batch sizes)
When running the model, I got this exception:
(There were indeed 3 bounding boxes in the label data)
Expected behavior
Loss should be calculated with no errors