Some of the accuracy in the reproduced paper is basically 0

yujiao12 commented 6 months ago

First of all thank you authors for this great work. When I followed the readme guide to reproduce the paper, I wanted to Evaluate the trained detectors in the Transfer Learning task, and I executed the provided sample script code as follows. python3 ./tools/train_net.py \ --eval-only \ --num-gpus 1 \ --config-file ./configs/COCO-InstanceSegmentation/CLIP_fast_rcnn_R_50_C4_ovd.yaml \ MODEL.WEIGHTS ./pretrained_ckpt/regionclip/regionclip_finetuned-coco_rn50.pth \ MODEL.CLIP.OFFLINE_RPN_CONFIG ./configs/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \ MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \ MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_48_base_cls_emb.pth \ MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \ MODEL.ROI_HEADS.SOFT_NMS_ENABLED True \ However, the Average Precision of RN50 and COCO (Generalized: Novel + Base) is very low, and the test result is basically 0, as follows.

[05/07 22:13:34 d2.evaluation.evaluator]: Inference done 4832/4836. Dataloading: 0.0011 s / iter. Inference: 4.8330 s / iter. Eval: 0.0004 s / iter. Total: 4.8347 s / iter. ETA=0:00:19 [05/07 22:13:44 d2.evaluation.evaluator]: Inference done 4833/4836. Dataloading: 0.0011 s / iter. Inference: 4.8341 s / iter. Eval: 0.0004 s / iter. Total: 4.8358 s / iter. ETA=0:00:14 [05/07 22:13:50 d2.evaluation.evaluator]: Inference done 4834/4836. Dataloading: 0.0011 s / iter. Inference: 4.8344 s / iter. Eval: 0.0004 s / iter. Total: 4.8361 s / iter. ETA=0:00:09 [05/07 22:13:55 d2.evaluation.evaluator]: Total inference time: 6:29:18.203654 (4.835066 s / iter per device, on 1 devices) [05/07 22:13:55 d2.evaluation.evaluator]: Total inference pure compute time: 6:29:10 (4.833392 s / iter per device, on 1 devices) [05/07 22:13:58 d2.evaluation.coco_evaluation]: Preparing results for COCO format ... [05/07 22:13:58 d2.evaluation.coco_evaluation]: Saving results to ./output/inference/coco_instances_results.json [05/07 22:14:00 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API... Loading and preparing results... DONE (t=1.64s) creating index... index created! [05/07 22:14:02 d2.evaluation.fast_eval_api]: Evaluate annotation type bbox [05/07 22:14:08 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 5.88 seconds. [05/07 22:14:08 d2.evaluation.fast_eval_api]: Accumulating evaluation results... [05/07 22:14:10 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 1.59 seconds. Average Precision (AP) @[ IoU=0.50:0.95 area= all maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 area= all maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.75 area= all maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 area= small maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 area=medium maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 area= large maxDets=100 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 1 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets= 10 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 area= all maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 area= small maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 area=medium maxDets=100 ] = 0.001 Average Recall (AR) @[ IoU=0.50:0.95 area= large maxDets=100 ] = 0.002 [05/07 22:14:10 d2.evaluation.coco_evaluation]: Evaluation results for bbox: AP AP50 AP75 APs APm APl

0.001 0.005 0.001 0.001 0.002 0.002

[05/07 22:14:10 d2.evaluation.coco_evaluation]: AP50_split_target AP: 0.0

[05/07 22:14:10 d2.evaluation.coco_evaluation]: AP50_split_base AP: 6.675366123124714e-05

[05/07 22:14:10 d2.evaluation.coco_evaluation]: AP50_split_all AP: 4.9295011370767116e-05

[05/07 22:14:10 d2.evaluation.coco_evaluation]: Per-category bbox AP:	category	AP	AP	category
person	0.074	bicycle	car	0.000
motorcycle	0.000	airplane	bus	0.000
train	0.000	truck	boat	0.000
bench	0.000	bird	cat	0.000
dog	0.000	horse	sheep	0.000
cow	0.000	elephant	bear	0.000
zebra	0.000	giraffe	backpack	0.000
umbrella	0.000	handbag	tie	0.000
suitcase	0.000	frisbee	skis	0.000
snowboard	0.000	kite	skateboard	0.000
surfboard	0.000	bottle	cup	0.000
fork	0.000	knife	spoon	0.000
bowl	0.000	banana	apple	0.000
sandwich	0.000	orange	broccoli	0.000
carrot	0.000	pizza	donut	0.000
cake	0.000	chair	couch	0.000
bed	0.000	toilet	tv	0.000
laptop	0.000	mouse	remote	0.000
keyboard	0.000	microwave	oven	0.000
toaster	0.000	sink	refrigerator	0.000
book	0.000	clock	vase	0.000
scissors	0.000	toothbrush

[05/07 22:14:10 d2.engine.defaults]: Evaluation results for coco_2017_ovd_all_test in csv format: [05/07 22:14:10 d2.evaluation.testing]: copypaste: Task: bbox [05/07 22:14:10 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl [05/07 22:14:10 d2.evaluation.testing]: copypaste: 0.0011,0.0049,0.0006,0.0006,0.0017,0.0019

What should I do, or does the author have any good advice? I didn't modify any parameters, just follow the steps. Look forward to receiving your reply, thanks again.

yujiao12 commented 6 months ago

The same problem arises in the RN50, RPN, COCO case in the Evaluation for Zero-shot Inference task. The sample script in test_zeroshot_inference.sh was executed, again following the readme tutorial. # RN50, RPN, COCO python3 ./tools/train_net.py \ --eval-only \ --num-gpus 1 \ --config-file ./configs/COCO-InstanceSegmentation/CLIP_fast_rcnn_R_50_C4_ovd_zsinf.yaml \ MODEL.WEIGHTS ./pretrained_ckpt/regionclip/regionclip_pretrained-cc_rn50.pth \ MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \ MODEL.CLIP.CROP_REGION_TYPE RPN \ MODEL.CLIP.MULTIPLY_RPN_SCORE True \ MODEL.CLIP.OFFLINE_RPN_CONFIG ./configs/LVISv1-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \ MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_lvis_866.pth \

YiwuZhong commented 5 months ago

@yujiao12 This post could help you #81.

microsoft / RegionCLIP

Some of the accuracy in the reproduced paper is basically 0 #98