microsoft / RegionCLIP

[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
Apache License 2.0
670 stars 48 forks source link

Some of the accuracy in the reproduced paper is basically 0 #98

Open yujiao12 opened 1 month ago

yujiao12 commented 1 month ago

First of all thank you authors for this great work. When I followed the readme guide to reproduce the paper, I wanted to Evaluate the trained detectors in the Transfer Learning task, and I executed the provided sample script code as follows. python3 ./tools/train_net.py \ --eval-only \ --num-gpus 1 \ --config-file ./configs/COCO-InstanceSegmentation/CLIP_fast_rcnn_R_50_C4_ovd.yaml \ MODEL.WEIGHTS ./pretrained_ckpt/regionclip/regionclip_finetuned-coco_rn50.pth \ MODEL.CLIP.OFFLINE_RPN_CONFIG ./configs/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \ MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \ MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_48_base_cls_emb.pth \ MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \ MODEL.ROI_HEADS.SOFT_NMS_ENABLED True \ However, the Average Precision of RN50 and COCO (Generalized: Novel + Base) is very low, and the test result is basically 0, as follows.

[05/07 22:13:34 d2.evaluation.evaluator]: Inference done 4832/4836. Dataloading: 0.0011 s / iter. Inference: 4.8330 s / iter. Eval: 0.0004 s / iter. Total: 4.8347 s / iter. ETA=0:00:19 [05/07 22:13:44 d2.evaluation.evaluator]: Inference done 4833/4836. Dataloading: 0.0011 s / iter. Inference: 4.8341 s / iter. Eval: 0.0004 s / iter. Total: 4.8358 s / iter. ETA=0:00:14 [05/07 22:13:50 d2.evaluation.evaluator]: Inference done 4834/4836. Dataloading: 0.0011 s / iter. Inference: 4.8344 s / iter. Eval: 0.0004 s / iter. Total: 4.8361 s / iter. ETA=0:00:09 [05/07 22:13:55 d2.evaluation.evaluator]: Total inference time: 6:29:18.203654 (4.835066 s / iter per device, on 1 devices) [05/07 22:13:55 d2.evaluation.evaluator]: Total inference pure compute time: 6:29:10 (4.833392 s / iter per device, on 1 devices) [05/07 22:13:58 d2.evaluation.coco_evaluation]: Preparing results for COCO format ... [05/07 22:13:58 d2.evaluation.coco_evaluation]: Saving results to ./output/inference/coco_instances_results.json [05/07 22:14:00 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API... Loading and preparing results... DONE (t=1.64s) creating index... index created! [05/07 22:14:02 d2.evaluation.fast_eval_api]: Evaluate annotation type bbox [05/07 22:14:08 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 5.88 seconds. [05/07 22:14:08 d2.evaluation.fast_eval_api]: Accumulating evaluation results... [05/07 22:14:10 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 1.59 seconds. Average Precision (AP) @[ IoU=0.50:0.95 area= all maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 area= all maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 area= all maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 area= small maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 area=medium maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 area= large maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 area= small maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 area=medium maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 area= large maxDets=100 ] = 0.002 [05/07 22:14:10 d2.evaluation.coco_evaluation]: Evaluation results for bbox: AP AP50 AP75 APs APm APl
0.001 0.005 0.001 0.001 0.002 0.002

[05/07 22:14:10 d2.evaluation.coco_evaluation]: AP50_split_target AP: 0.0

[05/07 22:14:10 d2.evaluation.coco_evaluation]: AP50_split_base AP: 6.675366123124714e-05

[05/07 22:14:10 d2.evaluation.coco_evaluation]: AP50_split_all AP: 4.9295011370767116e-05

[05/07 22:14:10 d2.evaluation.coco_evaluation]: Per-category bbox AP: category AP category AP category AP
person 0.074 bicycle 0.000 car 0.000
motorcycle 0.000 airplane 0.000 bus 0.000
train 0.000 truck 0.000 boat 0.000
bench 0.000 bird 0.000 cat 0.000
dog 0.000 horse 0.000 sheep 0.000
cow 0.000 elephant 0.000 bear 0.000
zebra 0.000 giraffe 0.000 backpack 0.000
umbrella 0.000 handbag 0.000 tie 0.000
suitcase 0.000 frisbee 0.000 skis 0.000
snowboard 0.000 kite 0.000 skateboard 0.000
surfboard 0.000 bottle 0.000 cup 0.000
fork 0.000 knife 0.000 spoon 0.000
bowl 0.000 banana 0.000 apple 0.000
sandwich 0.000 orange 0.000 broccoli 0.000
carrot 0.000 pizza 0.000 donut 0.000
cake 0.000 chair 0.000 couch 0.000
bed 0.000 toilet 0.000 tv 0.000
laptop 0.000 mouse 0.000 remote 0.000
keyboard 0.000 microwave 0.000 oven 0.000
toaster 0.000 sink 0.000 refrigerator 0.000
book 0.000 clock 0.000 vase 0.000
scissors 0.000 toothbrush 0.000

[05/07 22:14:10 d2.engine.defaults]: Evaluation results for coco_2017_ovd_all_test in csv format: [05/07 22:14:10 d2.evaluation.testing]: copypaste: Task: bbox [05/07 22:14:10 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl [05/07 22:14:10 d2.evaluation.testing]: copypaste: 0.0011,0.0049,0.0006,0.0006,0.0017,0.0019

What should I do, or does the author have any good advice? I didn't modify any parameters, just follow the steps. Look forward to receiving your reply, thanks again.

yujiao12 commented 1 month ago

The same problem arises in the RN50, RPN, COCO case in the Evaluation for Zero-shot Inference task. The sample script in test_zeroshot_inference.sh was executed, again following the readme tutorial. # RN50, RPN, COCO python3 ./tools/train_net.py \ --eval-only \ --num-gpus 1 \ --config-file ./configs/COCO-InstanceSegmentation/CLIP_fast_rcnn_R_50_C4_ovd_zsinf.yaml \ MODEL.WEIGHTS ./pretrained_ckpt/regionclip/regionclip_pretrained-cc_rn50.pth \ MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \ MODEL.CLIP.CROP_REGION_TYPE RPN \ MODEL.CLIP.MULTIPLY_RPN_SCORE True \ MODEL.CLIP.OFFLINE_RPN_CONFIG ./configs/LVISv1-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \ MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_lvis_866.pth \

YiwuZhong commented 1 month ago

@yujiao12 This post could help you #81.