gpt4vision / OvSGTR

[ECCV 2024 Best Paper Candidate] Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention"
Other
34 stars 2 forks source link

Query about the pre-training process #14

Open qwerhk839 opened 3 weeks ago

qwerhk839 commented 3 weeks ago

Thank you for your outstanding work, but I still met many problems in the process of reproducing the pre-training results.

I use the following command to pre-train the groundingdino_swint:

bash scripts/DINO_train_dist.sh coco ./config/GroundingDINO_SwinT_OGC_pretrain.py .Datasets/ovsgtr_data ./log/ovsgtr_vg_swint_pretrain ./GroundingDINO/weights/groundingdino_swint_ogc.pth

There may be some errors in your project: 1) Config file GroundingDINO_SwinT_OGC_pretrain.py: The has_bbox_supervision should be False. If it is True, it will cause an error in Line 116 in matcher.py, due to missing the "boxes" key. Besides, it will compute the boxes loss in Line 962 in groundingdino.py 2) Dataset file coco.py: Line 491 should be "while len(target['relations']) == 0:" and Line 494 should be removed, due to the missing key "boxes" in your provided coco_train2017_triple.json 3) Loss file losses.py: Line 734 should be the following code, also due to the missing "boxes" key. _if not self.rln_pretraining: num_boxes = sum(len(t["boxes"]) for t in targets) else: num_boxes = sum(len(t["gtnames"]) for t in targets)

By fixing the mentioned errors, I can run the pretraining comment (I provided above) successfully. However, I cannot reproduce your results in Table 5. without bounding box supervision. May I ask about the implementation details of table 5 (pretraining command and the test command, as well as the required computing resources, e.g., how many 3090), and whether the code repair is correct?

Looking forward to your reply. THX :)

JosephChenHub commented 2 weeks ago

Thank you for your outstanding work, but I still met many problems in the process of reproducing the pre-training results.

I use the following command to pre-train the groundingdino_swint:

bash scripts/DINO_train_dist.sh coco ./config/GroundingDINO_SwinT_OGC_pretrain.py .Datasets/ovsgtr_data ./log/ovsgtr_vg_swint_pretrain ./GroundingDINO/weights/groundingdino_swint_ogc.pth

There may be some errors in your project:

  1. Config file GroundingDINO_SwinT_OGC_pretrain.py: The has_bbox_supervision should be False. If it is True, it will cause an error in Line 116 in matcher.py, due to missing the "boxes" key. Besides, it will compute the boxes loss in Line 962 in groundingdino.py
  2. Dataset file coco.py: Line 491 should be "while len(target['relations']) == 0:" and Line 494 should be removed, due to the missing key "boxes" in your provided coco_train2017_triple.json
  3. Loss file losses.py: Line 734 should be the following code, also due to the missing "boxes" key. _if not self.rln_pretraining: num_boxes = sum(len(t["boxes"]) for t in targets) else: num_boxes = sum(len(t["gtnames"]) for t in targets)

By fixing the mentioned errors, I can run the pretraining comment (I provided above) successfully. However, I cannot reproduce your results in Table 5. without bounding box supervision. May I ask about the implementation details of table 5 (pretraining command and the test command, as well as the required computing resources, e.g., how many 3090), and whether the code repair is correct?

Looking forward to your reply. THX :)

The has_bbox_supervision should be True during pre-training , and the grounding process can be found at https://github.com/gpt4vision/OvSGTR/blob/master/tools/language_sgg_grounding.py . For computing resources, the original experiments are conducted on 4x / 8x A100 GPUs.

myukzzz commented 2 weeks ago

I have the same question: due to missing the "boxes" key in the provided json file(captions_train2017_triple. json and , captions_val2017_triple.json), has_bbox_supervision can only set to False.

On the other hand, in the https://github.com/gpt4vision/OvSGTR/blob/master/tools/language_sgg_grounding.py, box is also predicted rather than included in the tag file, which is inconsistent with has_bbox_supervision=True

JosephChenHub commented 2 weeks ago

I have the same question: due to missing the "boxes" key in the provided json file(captions_train2017_triple. json and , captions_val2017_triple.json), has_bbox_supervision can only set to False.

On the other hand, in the https://github.com/gpt4vision/OvSGTR/blob/master/tools/language_sgg_grounding.py, box is also predicted rather than included in the tag file, which is inconsistent with has_bbox_supervision=True

as mentioned above, has_bbox_supervision=True after grounding

myukzzz commented 2 weeks ago

In this case, the provided json file is not the result generated by grounding (without bounding box supervision), can you provide the correct json flie for pre-training?

JosephChenHub commented 3 days ago

In this case, the provided json file is not the result generated by grounding (without bounding box supervision), can you provide the correct json flie for pre-training?

please use https://github.com/gpt4vision/OvSGTR/blob/master/tools/language_sgg_grounding.py to generate boxes.