Open Hasanmog opened 10 months ago
I'm not sure, but if everything is configured correctly, there may be something wrong with the code, My two suggestions are: 1. Check whether all parameters of the configuration file are correct. 2. Take a closer look at the evaluation code. There may be problems, but I am not sure.
Or if you have any new findings or logs, can you provide them to analyze the specific problems?
Hello ,
@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine. I modified the
evaluate
function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results. I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good. I placed label_list containing the categories in the cfg_odvg.py any idea/tips where the source of the problem could be ?
I encountered the same problem. The mAP I got is low but the bounding boxes are quite good. Have you solved the problem?
@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.
are you encountering same issue too ?
@longzw1997 Any suggestions? It looks like there may be a small problem but I don't know where it is.
It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?
It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?
yes.
It looks like during the evaluation, the code did not import the correct class name. Have 'label_list' and 'use_coco_eval = False' in cfg_odvg.py been modified?
yes.
so the evaluation results normal now?
no I modified them from the beginning , but still same issue.
I don't know if the problem is from my side or there is actually a bug in the code.
Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the gt_label
and the _res_label
from the evaluation
function, they never match. In terms of bounding boxes they are good.
So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.
I find that the the evaluation result of coco is the sam whether using groundingdino_swint_ogc.pth or groundingdino_swinb_cogcoor.pth. Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.552 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.709 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.610 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.401 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.591 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.692 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.407 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.706 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.784 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.638 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.826 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.920
@SamXiaosheng , I think it depends on the dataset you're using. If your dataset contains like referring expressions , you will find that Groundingdino_swinb performs better because it was trained on RefCOCO while the swint variant wasn't.
check this table
no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the
gt_label
and the_res_label
from theevaluation
function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.
@Hasanmog @longzw1997 @BIGBALLON I suspected there was something wrong in the evaluation code during the training process, so I re-wrote an evaluation code in coco format and ran it in the official code base(I used this code to train and obtained the weight, then I eval it using my evaluate function based on official code). The mAP accuracy was very high (about mAP@.5 0.90, map.5 :.95 0.70, but in training log mAP@.5 no more than 0.1).
And I suspect that in this code base, the output of the model is correct, but the _res_labels used to calculate the mAP are incorrect, so the problem may arise in the process of converting the output of the model to _res_labels in json file format. For example, xywh processing is incorrect.
no I modified them from the beginning , but still same issue. I don't know if the problem is from my side or there is actually a bug in the code. Training is working normally , I compared the visualizations with the vanilla. But during evaluation , the scores are really low and I tried printing out the
gt_label
and the_res_label
from theevaluation
function, they never match. In terms of bounding boxes they are good. So , this might be one of the reasons why the scores are this low , thats why I asked @Qia98 if he is also having unmatched labels.@Hasanmog @longzw1997 @BIGBALLON I suspected there was something wrong in the evaluation code during the training process, so I re-wrote an evaluation code in coco format and ran it in the official code base(I used this code to train and obtained the weight, then I eval it using my evaluate function based on official code). The mAP accuracy was very high (about mAP@.5 0.90, map.5 :.95 0.70, but in training log mAP@.5 no more than 0.1).
Hi, @Qia98 I agree with your viewpoint and, if you find the time, please feel free to create a pull request to address this issue. 😄
@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.
are you encountering same issue too ?
Hi, I also encounterd the same problem, when I visualize the detection results, I found that location of bounding boxes is correct, but the categories are usually incorrect. Is there something wrong with BERT or it is because of other reasons?
@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels.
are you encountering same issue too ?
I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)
@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?
I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)
@junfengcao feel free to create a pull request 😄
@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?
I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)
@junfengcao feel free to create a pull request 😄
May I ask if this problem has been solved by anyone? I've encountered the same problem and have troubleshot a number of situations but have not solved the problem.
@Qia98 , when evaluating , are the predicted labels correct compared to the gt ? or logical ? I printed out the labels , it never matches the gt labels. are you encountering same issue too ?
I debugged the function of evaluation, and I found the issue may be due to post-processing, seen in models/GroundingDINO/groundingdino.py(class PostProcess)
@junfengcao feel free to create a pull request 😄 This happened to me, too, with mAP of only 0.2%. When I changed the dataset, the accuracy was more than 40%, but why did the mAP decrease with training
Hello ,
@aghand0ur and I used your code to train on a custom dataset(20 classes) , everything went fine. I modified the
evaluate
function to suit this specific task , when testing on my test dataset(changed it to coco format) , the coco results are really low , although visualizing samples showed impressive results. I printed out the labels being predicted during evaluation , it is never returning correct prediction while the bounding boxes are quiet good. I placed label_list containing the categories in the cfg_odvg.py any idea/tips where the source of the problem could be ?