microsoft / GLIP

Grounded Language-Image Pre-training
MIT License
2.23k stars 193 forks source link

Get much lower AP on LVIS after directly following the instructions #63

Open backseason opened 1 year ago

backseason commented 1 year ago

Dear authors,

By following the commands in https://github.com/microsoft/GLIP#lvis-evaluation using the provided config file and pretrained weights, the evaluation results on LVIS minival is much lower than the ones reported in README.

The command I used: CUDA_VISIBLE_DEVICES=3,4,5,6 python -m torch.distributed.launch --nproc_per_node=4 tools/test_grounding_net.py --config-file configs/pretrain/glip_Swin_T_O365_GoldG.yaml --task_config configs/lvis/minival.yaml --weight PRETRAINED/glip_tiny_model_o365_goldg.pth TEST.EVAL_TASK detection OUTPUT_DIR evals/lvis TEST.CHUNKED_EVALUATION 40 TEST.IMS_PER_BATCH 16 SOLVER.IMS_PER_BATCH 16 TEST.MDETR_STYLE_AGGREGATE_CLASS_NUM 3000 MODEL.RETINANET.DETECTIONS_PER_IMG 300 MODEL.FCOS.DETECTIONS_PER_IMG 300 MODEL.ATSS.DETECTIONS_PER_IMG 300 MODEL.ROI_HEADS.DETECTIONS_PER_IMG 300

The corresponding evaluation results using the config and weights of GLIP-T(C), who's APr should be either ~14.3 or ~17.7:

image

I have also tried the GLIP-T(A), but the results is also much lower. Do you have any suggestions about where I might haven't done correctly?

felixfuu commented 1 year ago

@backseason Have you solved the problem?

JiuqingDong commented 1 year ago

Dear authors,

By following the commands in https://github.com/microsoft/GLIP#lvis-evaluation using the provided config file and pretrained weights, the evaluation results on LVIS minival is much lower than the ones reported in README.

The command I used: CUDA_VISIBLE_DEVICES=3,4,5,6 python -m torch.distributed.launch --nproc_per_node=4 tools/test_grounding_net.py --config-file configs/pretrain/glip_Swin_T_O365_GoldG.yaml --task_config configs/lvis/minival.yaml --weight PRETRAINED/glip_tiny_model_o365_goldg.pth TEST.EVAL_TASK detection OUTPUT_DIR evals/lvis TEST.CHUNKED_EVALUATION 40 TEST.IMS_PER_BATCH 16 SOLVER.IMS_PER_BATCH 16 TEST.MDETR_STYLE_AGGREGATE_CLASS_NUM 3000 MODEL.RETINANET.DETECTIONS_PER_IMG 300 MODEL.FCOS.DETECTIONS_PER_IMG 300 MODEL.ATSS.DETECTIONS_PER_IMG 300 MODEL.ROI_HEADS.DETECTIONS_PER_IMG 300

The corresponding evaluation results using the config and weights of GLIP-T(C), who's APr should be either ~14.3 or ~17.7: image

I have also tried the GLIP-T(A), but the results is also much lower. Do you have any suggestions about where I might haven't done correctly?

Did you use the model from the model zoo? or train the model by yourself?

SikaStar commented 1 year ago

Dear authors,

By following the commands in https://github.com/microsoft/GLIP#lvis-evaluation using the provided config file and pretrained weights, the evaluation results on LVIS minival is much lower than the ones reported in README.

The command I used: CUDA_VISIBLE_DEVICES=3,4,5,6 python -m torch.distributed.launch --nproc_per_node=4 tools/test_grounding_net.py --config-file configs/pretrain/glip_Swin_T_O365_GoldG.yaml --task_config configs/lvis/minival.yaml --weight PRETRAINED/glip_tiny_model_o365_goldg.pth TEST.EVAL_TASK detection OUTPUT_DIR evals/lvis TEST.CHUNKED_EVALUATION 40 TEST.IMS_PER_BATCH 16 SOLVER.IMS_PER_BATCH 16 TEST.MDETR_STYLE_AGGREGATE_CLASS_NUM 3000 MODEL.RETINANET.DETECTIONS_PER_IMG 300 MODEL.FCOS.DETECTIONS_PER_IMG 300 MODEL.ATSS.DETECTIONS_PER_IMG 300 MODEL.ROI_HEADS.DETECTIONS_PER_IMG 300

The corresponding evaluation results using the config and weights of GLIP-T(C), who's APr should be either ~14.3 or ~17.7: image

I have also tried the GLIP-T(A), but the results is also much lower. Do you have any suggestions about where I might haven't done correctly?

Have you solved the problem?

Mukil07 commented 5 months ago

@backseason Have you solved the issue ?