Fail to reproduce the performance reported in the paper

Control-xl commented 3 years ago

I download the required data and pretrained net to run train.sh. However, the model achieves only 57.1% overall accuracy far lower than the reported 68.8%. Is there any wrong with the given setting? Or do I omit something that the model needs but do not throw a wrong message?

Here I provide a result example of epoch 78. {"CLOSED": {"count": 185.0, "real": 180.0, "true": 136.0, "real_percent": 0.0, "score": 0.7351351351351352, "score_percent": 73.5}, "OPEN": {"count": 123.0, "real": 90.0, "true": 40.0, "real_percent": 0.0, "score": 0.3252032520325203, "score_percent": 32.5}, "ALL": {"count": 308.0, "real": 270.0, "true": 176.0, "real_percent": 0.0, "score": 0.5714285714285714, "score_percent": 57.1}}

BTW, the script in the test.sh is wrong, where the argument --input should be _savedmodels/CMSA-Bio-MTPT instead of _savedmodels/MTPT-Bio-CMSA

haifangong commented 3 years ago

Several teams have reproduced the results with a score of around 68%. You may carefully check your code.

Control-xl commented 3 years ago

I download the code and didn't modify anything. Is it still my problem??? I hope the released code can be free of bugs. I am still running the code, and I guess maybe the wrong default argument contributes to the low performance, where you set ImageNet pretrained Resnet34 as your default visual feature extractor instead of the proposed pretrained Resnet in your paper.

Control-xl commented 3 years ago

A follow-up. I modify the given arguments so that the three Resnet34 load the pretrain weight, here is the best score I can find when I test the model over ALL the saved checkpoints:

{'CLOSED': {'count': 185.0, 'real': 180.0, 'true': 141.0, 'real_percent': 0.0, 'score': 0.7621621621621621, 'score_percent': 76.2}, 'OPEN': {'count': 123.0, 'real': 90.0, 'true': 64.0, 'real_percent': 0.0, 'score': 0.5203252032520326, 'score_percent': 52.0}, 'ALL': {'count': 308.0, 'real': 270.0, 'true': 205.0, 'real_percent': 0.0, 'score': 0.6655844155844156, 'score_percent': 66.6}}

If you are confident enough of your results, you can check YOUR uploaded code carefully and a simple modification can be easily made to let the code free of bug, instead of directly closing this issue and say that it is my problem. :)

Edwina-coco commented 1 year ago

A follow-up. I modify the given arguments so that the three Resnet34 load the pretrain weight, here is the best score I can find when I test the model over ALL the saved checkpoints:

{'CLOSED': {'count': 185.0, 'real': 180.0, 'true': 141.0, 'real_percent': 0.0, 'score': 0.7621621621621621, 'score_percent': 76.2}, 'OPEN': {'count': 123.0, 'real': 90.0, 'true': 64.0, 'real_percent': 0.0, 'score': 0.5203252032520326, 'score_percent': 52.0}, 'ALL': {'count': 308.0, 'real': 270.0, 'true': 205.0, 'real_percent': 0.0, 'score': 0.6655844155844156, 'score_percent': 66.6}}

If you are confident enough of your results, you can check YOUR uploaded code carefully and a simple modification can be easily made to let the code free of bug, instead of directly closing this issue and say that it is my problem. :)

hello，i got the same result as you, how did you finally solve it?

Control-xl commented 1 year ago

A follow-up. I modify the given arguments so that the three Resnet34 load the pretrain weight, here is the best score I can find when I test the model over ALL the saved checkpoints: {'CLOSED': {'count': 185.0, 'real': 180.0, 'true': 141.0, 'real_percent': 0.0, 'score': 0.7621621621621621, 'score_percent': 76.2}, 'OPEN': {'count': 123.0, 'real': 90.0, 'true': 64.0, 'real_percent': 0.0, 'score': 0.5203252032520326, 'score_percent': 52.0}, 'ALL': {'count': 308.0, 'real': 270.0, 'true': 205.0, 'real_percent': 0.0, 'score': 0.6655844155844156, 'score_percent': 66.6}} If you are confident enough of your results, you can check YOUR uploaded code carefully and a simple modification can be easily made to let the code free of bug, instead of directly closing this issue and say that it is my problem. :)

hello，i got the same result as you, how did you finally solve it?

Sorry for the late reply. It has been a long time since I ran the code. If I remember correctly, you need to download a word embedding file (maybe glove300d?) to help run the codes. But with the embedding, the overall results are still lower than the reported results (maximum of around 70% but the reported is 73.1%). You can refer to their latest paper on Med-VQA and you can find the results of CMSA-MTPT is 67.9%.hhhh

Edwina-coco commented 1 year ago

ok.thank you very much

rhyhck commented 11 months ago

A follow-up. I modify the given arguments so that the three Resnet34 load the pretrain weight, here is the best score I can find when I test the model over ALL the saved checkpoints:

{'CLOSED': {'count': 185.0, 'real': 180.0, 'true': 141.0, 'real_percent': 0.0, 'score': 0.7621621621621621, 'score_percent': 76.2}, 'OPEN': {'count': 123.0, 'real': 90.0, 'true': 64.0, 'real_percent': 0.0, 'score': 0.5203252032520326, 'score_percent': 52.0}, 'ALL': {'count': 308.0, 'real': 270.0, 'true': 205.0, 'real_percent': 0.0, 'score': 0.6655844155844156, 'score_percent': 66.6}}

If you are confident enough of your results, you can check YOUR uploaded code carefully and a simple modification can be easily made to let the code free of bug, instead of directly closing this issue and say that it is my problem. :)

Hello, I have also encountered similar difficulties. Here is my best outcome: {"CLOSED": {"count": 185.0, "real": 180.0, "true": 132.0, "real_percent": 0.0, "score": 0.7135135135136, "score_percent": 71.4}, "OPEN": {"count": 123.0, "real": 90.0, "true": 42.0, "real_percent": 0.0, "score": 0.3414634146341463437, "score_percent": 34.1}, "ALL: {" count ": 308.0," real ": 270.0," true ": 174.0," real_percent ": 0.0," score ": 0.564935064935065," score_percent ": 56.5}} How did you change the pre trained resnet to improve accuracy?

rhyhck commented 11 months ago

ok.thank you very much

Hello, Have you reproduced the results of this paper? I have also encountered similar difficulties. Here is my best outcome: {"CLOSED": {"count": 185.0, "real": 180.0, "true": 132.0, "real_percent": 0.0, "score": 0.7135135135136, "score_percent": 71.4}, "OPEN": {"count": 123.0, "real": 90.0, "true": 42.0, "real_percent": 0.0, "score": 0.3414634146341463437, "score_percent": 34.1}, "ALL: {" count ": 308.0," real ": 270.0," true ": 174.0," real_percent ": 0.0," score ": 0.564935064935065," score_percent ": 56.5}} How did you change the pre trained resnet to improve accuracy?

haifangong / CMSA-MTPT-4-MedicalVQA

Fail to reproduce the performance reported in the paper #3