run train_net.py eval function and demo.py get different results

wangshuailpp commented 2 years ago

hi, I have trained with coco datasets(only extract 100 images for quickly test), and evaluate these 100 images to get AP(about 90). But when I run demo.py to view the mask of image, the result is very bad. Also, I have already run demo.py with the model your provide and get good mask. Thanks!

wondervictor commented 2 years ago

Hi @wangshuailpp, thanks for your interest in SparseInst. Have you loaded the correct model or config when using demo.py to inference?

wangshuailpp commented 2 years ago

I set the same model and config for test_net and demo. test_net gets good AP(about 90), but demo gets very bad mask of image. Thanks!

wondervictor commented 2 years ago

Hi @wangshuailpp, it seems that you only train the model with 100 images from scratch and evaluated it on the training set. The model might overfit the 100 images. How about the images for visualization. Are these images are same as the images during testing?

wangshuailpp commented 2 years ago

I sure that the test and the train are same images. But I found a strange phenomenon. The model size I train is all 400. 4M(from model_0004999.pth to model_final.pth), but the model size your provide is 133.7(sparse_inst_r50vd_dcn_giam_aug_67dc06.pth).

wondervictor commented 2 years ago

Maybe, you need to check the trained model when using demo.py and check whether the weights are loaded correctly. As for the second problem, the model checkpoint contains model weights, states of the optimizer, and some necessary elements, which are stored for resuming training. We remove these useless states of the checkpoints for less space.

fabro66 commented 2 years ago

Hi @wondervictor

I met the same problem. The training phase shows that the model achieves a high AP. But in the inference stage, it gets a bad performance. I trained coco/train_2017 and evaluated coco/val2017. I don't know why it gets such a high AP.

Backbone: the backbone of yolov5s contained FPN layer lr: 0.00001 batch size: 16

[06/27 09:03:49] d2.evaluation.coco_evaluation INFO: Evaluation results for segm: 
[06/27 09:03:47] d2.evaluation.fast_eval_api INFO: COCOeval_opt.evaluate() finished in 10.19 seconds.
[06/27 09:03:47] d2.evaluation.fast_eval_api INFO: Accumulating evaluation results...
[06/27 09:03:49] d2.evaluation.fast_eval_api INFO: COCOeval_opt.accumulate() finished in 1.28 seconds.
[06/27 09:03:49] d2.evaluation.coco_evaluation INFO: Evaluation results for segm: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 49.467 | 87.500 | 48.812 | 32.919 | 49.825 | 69.343 |
[06/27 09:03:49] d2.evaluation.coco_evaluation INFO: Per-category segm AP: 
| category      | AP     | category     | AP     | category       | AP     |
|:--------------|:-------|:-------------|:-------|:---------------|:-------|
| person        | 43.785 | bicycle      | 32.271 | car            | 42.777 |
| motorcycle    | 41.864 | airplane     | 53.666 | bus            | 64.478 |
| train         | 71.820 | truck        | 55.110 | boat           | 39.147 |
| traffic light | 41.711 | fire hydrant | 61.725 | stop sign      | 70.547 |
| parking meter | 65.880 | bench        | 38.417 | bird           | 31.598 |
| cat           | 75.043 | dog          | 66.143 | horse          | 41.768 |
| sheep         | 48.491 | cow          | 46.441 | elephant       | 59.619 |
| bear          | 78.057 | zebra        | 56.101 | giraffe        | 53.960 |
| backpack      | 36.774 | umbrella     | 47.956 | handbag        | 33.903 |
| tie           | 34.070 | suitcase     | 52.099 | frisbee        | 61.102 |
| skis          | 12.655 | snowboard    | 35.651 | sports ball    | 46.322 |
| kite          | 35.145 | baseball bat | 24.195 | baseball glove | 49.813 |
| skateboard    | 36.658 | surfboard    | 41.859 | tennis racket  | 52.250 |
| bottle        | 39.589 | wine glass   | 28.893 | cup            | 49.328 |
| fork          | 17.350 | knife        | 20.521 | spoon          | 21.177 |
| bowl          | 54.863 | banana       | 43.836 | apple          | 47.963 |
| sandwich      | 63.597 | orange       | 54.590 | broccoli       | 46.770 |
| carrot        | 40.702 | hot dog      | 50.465 | pizza          | 63.825 |
| donut         | 53.661 | cake         | 55.068 | chair          | 36.520 |
| couch         | 62.362 | potted plant | 43.600 | bed            | 76.547 |
| dining table  | 52.464 | toilet       | 72.657 | tv             | 66.814 |
| laptop        | 62.028 | mouse        | 60.983 | remote         | 38.988 |
| keyboard      | 63.289 | cell phone   | 43.698 | microwave      | 65.430 |
| oven          | 61.911 | toaster      | 70.000 | sink           | 55.822 |
| refrigerator  | 71.195 | book         | 26.437 | clock          | 64.107 |
| vase          | 51.646 | scissors     | 42.721 | teddy bear     | 59.757 |
| hair drier    | 49.020 | toothbrush   | 26.258 |                |        |
[06/27 09:03:49] d2.engine.defaults INFO: Evaluation results for coco/val2017 in csv format:
[06/27 09:03:49] d2.evaluation.testing INFO: copypaste: Task: segm
[06/27 09:03:49] d2.evaluation.testing INFO: copypaste: AP,AP50,AP75,APs,APm,APl
[06/27 09:03:49] d2.evaluation.testing INFO: copypaste: 49.4665,87.5005,48.8123,32.9194,49.8253,69.3428

wondervictor commented 2 years ago

Hi @fabro66, could you provide a model config with weights for me? This problem is strange.

fabro66 commented 2 years ago

Hi @fabro66, could you provide a model config with weights for me? This problem is strange.

OK!

model & configs & weights

Please check it!

wondervictor commented 2 years ago

Hi @fabro66, I've tested the model along with the weights through two scripts, i.e., train_net.py and test_net.py on my local machine (4 NVIDIA 3090 GPUs, PyTorch=1.9.1, cuda=11.1, detectron2=0.6)

train_net.py:

python train_net.py --config-file configs/instance/sparse_inst_y5s_giam.yaml --eval --num-gpus 4 MODEL.WEIGHTS sparseinst_y5s_backbone/weights/model_Final.pth

outputs:

[06/27 14:54:39 d2.engine.defaults]: Evaluation results for coco_2017_val in csv format: 
[06/27 14:54:39 d2.evaluation.testing]: copypaste: Task: segm 
[06/27 14:54:39 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl 
[06/27 14:54:39 d2.evaluation.testing]: copypaste: 49.4720,87.5011,48.7639,32.9183,49.8236,69.3405

test_net.py:

python test_net.py --config-file configs/instance/sparse_inst_y5s_giam.yaml MODEL.WEIGHTS sparseinst_y5s_backbone/weights/model_Final.pth

outputs:

[06/27 15:03:19 d2.evaluation.testing]: copypaste: Task: segm 
[06/27 15:03:19 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl 
[06/27 15:03:19 d2.evaluation.testing]: copypaste: 49.4670,87.4931,48.7567,32.9138,49.8194,69.3319 
speed: 0.0159s FPS: 62.78

It seems that the results are normal through two ways of evaluation. Can you provide more details about the inconsistency between training and testing? BTW, I'm concerned about the pre-trained weights used in the provided backbone because it achieves promising results (49.5 mAP, 63 FPS). It's amazing.

wondervictor commented 2 years ago

Hi @fabro66 and @wangshuailpp, I find that the demo.py outputs strange results and I'm going to fix it.

fabro66 commented 2 years ago

It seems that the results are normal through two ways of evaluation. Can you provide more details about the inconsistency between training and testing? BTW, I'm concerned about the pre-trained weights used in the provided backbone because it achieves promising results (49.5 mAP, 63 FPS). It's amazing.

Hi @wondervictor The inconsistency I express here refers to the high AP in the training phase and bad performance in demo.py. I use pre-trained weights of the yolov5s backbone in coco dataset.

wondervictor commented 2 years ago

Hi @fabro66 and @wangshuailpp, I've fixed this problem! This problem is due to a mistake about INPUT_FORMAT, or exactly, "BGR" and "RGB". In demo.py, images are loaded in an RGB format in demo.py:L93 https://github.com/hustvl/SparseInst/blob/bd57455aa49c4cb37d66c77ccd477c7a5ebee444/demo.py#L93 but it's converted to the BGR format in detectron2/engine/defaults.py:L311 https://github.com/facebookresearch/detectron2/blob/224cd2318fdb45b5e22bbb861ee9711ee52c8b75/detectron2/engine/defaults.py#L311 which is a wrong step. To solve it, you can add another convension by:

predictions = self.predictor(image[:,:,::-1])

in sparse_inst/d2_predictor.py:L49.

And I'll update the code to fix this bug.

fabro66 commented 2 years ago

Hi @wondervictor When I fixed the input format bug, it still gets bad performance. Do you know where the problem is? I'm guessing that the model is overfitting because it uses pre-trained weights of the yolov5s backbone trained in the coco dataset.

wondervictor commented 2 years ago

Hi @fabro66, the model you've trained achieves 49.5 AP on COCO val2017 and should perform well on the images. From the evaluation results, it seems the problem is not due to the overfitting. Have you compared the results before/after changing the image format?

fabro66 commented 2 years ago

Hi @wondervictor. I have compared the results before/after changing the image format. It will get better performance than before changing the image format. However, it still gets bad performance in coco_val_2017 dataset.

fabro66 commented 2 years ago

Hi @wondervictor .

As long as I keep training sparseinst-yolonet, the AP keeps improving. For the validation dataset, some images are segmented well, even some people of small size. In some images, even if the person is standing in a large area in the image, the performance is bad. Is there something wrong with my hyperparameters? Could you help reproduce sparseinst-yolonet?

wondervictor commented 2 years ago

Hi @fabro66, I'd like to solve this problem while it takes a little time now. I'm working on it. If you make any progress, please feel free to mention me in this issue : )

wondervictor commented 2 years ago

Hi @fabro66, I've evaluated the visualization results on the training set and the results are also bad. This problem is much weird. The AP on val2017 is 49.5, which is much higher than that using ResNet-50. However, the visualization results are worse than the ResNet-based models. @fabro66, could you run the visualization results with the pretrained models with ResNet-50 to check whether the scripts in your environment work well. I'm going to re-train the SparseInst with the yolo model. I'll notify you if I achieve any progress : )

fabro66 commented 2 years ago

Hi @wondervictor. I used the pretrained model with ResNet-50 for visualization on val2017 and got good performance, which confirms that my environment is ok. I don't sure whether my batch size is set too small (16 in my experiments). Looking forward to your new progress！

xjsxujingsong commented 2 years ago

Hi @fabro66 I am going to use yolov5s as backbone. Would you please share your configs and models. The above link is dead. I will test it on my machine as well. Thanks.

116022017144 commented 1 year ago

Hi @fabro66 I am going to use yolov5s as backbone. Would you please share your configs and models. The above link is dead. I will test it on my machine as well. Thanks. I also want to try it,wait. Thanks

hustvl / SparseInst

run train_net.py eval function and demo.py get different results #42