longzw1997 / Open-GroundingDino

This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection.
MIT License
381 stars 57 forks source link

Predicted labels #46

Open Hasanmog opened 8 months ago

Hasanmog commented 8 months ago

Hello ,

@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine. I modified theevaluate function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results. I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good. I placed label_list containing the categories in the cfg_odvg.py any idea/tips where the source of the problem could be ?

BIGBALLON commented 8 months ago

I'm not sure, but if everything is configured correctly, there may be something wrong with the code, My two suggestions are: 1. Check whether all parameters of the configuration file are correct. 2. Take a closer look at the evaluation code. There may be problems, but I am not sure.

Or if you have any new findings or logs, can you provide them to analyze the specific problems?

Qia98 commented 8 months ago

Hello ,

@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine. I modified theevaluate function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results. I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good. I placed label_list containing the categories in the cfg_odvg.py any idea/tips where the source of the problem could be ?

I encountered the same problem. The mAP I got is low but the bounding boxes are quite good. Have you solved the problem?

Hasanmog commented 8 months ago

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

BIGBALLON commented 8 months ago

@longzw1997 Any suggestions? It looks like there may be a small problem but I don't know where it is.

longzw1997 commented 8 months ago

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

Hasanmog commented 8 months ago

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

yes.

BIGBALLON commented 8 months ago

It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?

yes.

so the evaluation results normal now?

Hasanmog commented 8 months ago

no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

SamXiaosheng commented 8 months ago

I find that the the evaluation result of coco is the sam whether using groundingdino_swint_ogc.pth or groundingdino_swinb_cogcoor.pth. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.552 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.709 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.610 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.401 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.591 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.407 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.706 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.784 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.638 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.826 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.920

Hasanmog commented 8 months ago

@SamXiaosheng , I think it depends on the dataset you're using. If your dataset contains like referring expressions , you will find that Groundingdino_swinb performs better because it was trained on RefCOCO while the swint variant wasn't.

check this table

Qia98 commented 8 months ago

no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

@Hasanmog @longzw1997 @BIGBALLON I suspected there was something wrong in the evaluation code during the training process, so I re-wrote an evaluation code in coco format and ran it in the official code base(I used this code to train and obtained the weight, then I eval it using my evaluate function based on official code). The mAP accuracy was very high (about mAP@.5 0.90, map.5 :.95 0.70, but in training log mAP@.5 no more than 0.1).

Qia98 commented 8 months ago

And I suspect that in this code base, the output of the model is correct, but the _res_labels used to calculate the mAP are incorrect, so the problem may arise in the process of converting the output of the model to _res_labels in json file format. For example, xywh processing is incorrect.

BIGBALLON commented 8 months ago

no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label and the _res_label from the evaluation function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.

@Hasanmog @longzw1997 @BIGBALLON I suspected there was something wrong in the evaluation code during the training process, so I re-wrote an evaluation code in coco format and ran it in the official code base(I used this code to train and obtained the weight, then I eval it using my evaluate function based on official code). The mAP accuracy was very high (about mAP@.5 0.90, map.5 :.95 0.70, but in training log mAP@.5 no more than 0.1).

Hi, @Qia98 I agree with your viewpoint and, if you find the time, please feel free to create a pull request to address this issue. 😄

EddieEduardo commented 8 months ago

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

Hi, I also encounterd the same problem, when I visualize the detection results, I found that location of bounding boxes is correct, but the categories are usually incorrect. Is there something wrong with BERT or it is because of other reasons?

junfengcao commented 6 months ago

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.

are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

BIGBALLON commented 6 months ago

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄

jaychempan commented 5 months ago

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄

May I ask if this problem has been solved by anyone? I've encountered the same problem and have troubleshot a number of situations but have not solved the problem.

caicaisy commented 5 months ago

@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?

I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)

@junfengcao feel free to create a pull request 😄 This happened to me, too, with mAP of only 0.2%. When I changed the dataset, the accuracy was more than 40%, but why did the mAP decrease with training