Closed ZhangT-tech closed 4 months ago
Hi, Thanks for your interest! I would like to learn the results you have got?
Hi, thanks for your follow-up:
It is not stable, for the first running of experiment, it will give me the same answer when I run the eval_polypdiag_finetune, like 91.6%, it remains consistent when I run the test. However, if i use the same pretrained weight I just have from last eval, it will give me a different number, like 80%, 76% and so on. I don't see why with the same model on the same test dataset, the F1 score will be different. Have you had this problem?
I also encountered this problem, and my test results were even lower. Of course, since my training environment does not support distributed training, I annotated the relevant code. Is this also related to this?
Hi, @10086ddd
I have found that the model is not in eval
mode during test, and the saved model only contains the linear classifier, without the fine-tuned backbone.
These issues are addressed currently, you can pull the latest code and try again!
https://github.com/med-air/Endo-FM/blob/f9136ebc5fe28869d3b28a50fb734eab0d25c2b0/eval_finetune.py#L218
https://github.com/med-air/Endo-FM/blob/f9136ebc5fe28869d3b28a50fb734eab0d25c2b0/eval_finetune.py#L168-L175
Hi,thanks for your follow-up: @Kyfafyd I will test the new code in the near future and inform you of the results in a timely manner. Thanks for your work again. Also, I have a small question that I would like to ask you. If I want to switch to another endoscopic dataset for testing downstream classification tasks, do I only need to make downstream Fine-tuning first and then test directly?
Hi, @10086ddd Yes, that's exactly right! You can refer to this issue for more details: https://github.com/med-air/Endo-FM/issues/12
Hi, @Kyfafyd I tested the new code today, a total of 5 times. Of course, due to environmental issues, I commented out the relevant code for distributed training. But the results of the five tests were 74.8%, 34.1%, 83.8%, 45.4%, and 85.7%, respectively. Does this mean that the model is still not very stable? Additionally, I have a question about the classification of downstream tasks in the PolypDiag dataset. Is each frame of the video labeled as abnormal labeled as diseased?
Hi @10086ddd Are you using the latest updated model, which is in readme: https://mycuhk-my.sharepoint.com/personal/1155167044_link_cuhk_edu_hk/_layouts/15/onedrive.aspx?id=%2Fpersonal%2F1155167044%5Flink%5Fcuhk%5Fedu%5Fhk%2FDocuments%2FEndo%2DFM%2Fdownstream%5Fweights%2Fpolypdiag%2Epth&parent=%2Fpersonal%2F1155167044%5Flink%5Fcuhk%5Fedu%5Fhk%2FDocuments%2FEndo%2DFM%2Fdownstream%5Fweights&ga=1 My testing result using this model is consistently 91.5% over time.
This PolypDiag task is a video level task, diagnosis each video is with disease or not.
Hi,@Kyfafyd Sorry, it was my mistake. I did forget to use the latest updated model. Also, regarding the dataset of the PolypDiag task, do you mean that in the future, if I want to use some diseased and disease-free endoscopic videos for fine-tuning and testing, wouldn't diseased videos need to include diseased areas in every frame?
Hi, @10086ddd Yes, this is a classification task, so no area is needed. If you want to perform lesion detection, you can refer to STFT in this repo.
Hi, @Kyfafyd Thank you for your work and answer. Actually, what I want to say is the classification task. Previously, I meant to ensure that every frame in the video belongs to the same category. Now it seems unnecessary?
Hi, @10086ddd To be note that this task is to recognize one video is diseased or not. So it is not necessary.
OK,Thanks @Kyfafyd
Hi, @Kyfafyd I'm very sorry to disturb you again. But after using the latest updated model yesterday, the test results are still unstable. I noticed that the source code uses distributed training. Because my environment does not support it, I commented out this part of the code. But I haven't made any other modifications. If I only train on one GPU, wouldn't it be necessary to not only annotate the distributed training related code, but also modify the model's parameters and configuration files?
Hi, @10086ddd You may test the model under distributed environment. Only 1 gpu can also setup distributed scrnario.
Hi,@Kyfafyd I'm sorry for replying to you only today. During this time, I have been testing on a Linux server with one GPU, and the model results have been successfully stable. Thank you for your previous answer. But the result remained stable at 66%, and I think I should adjust it again.
Hi, @10086ddd ARe you using the latest code and latest weight?
Hi, @Kyfafyd Yes,I downloaded the new project code and weight file again and tested it again, but the result was still 66.1%.
Hi @10086ddd I forget to add the line for loading updated backbone during testing. You can try the latest code! Currently, it may give the correct result.
Hi, @Kyfafyd The new code has successfully achieved the expected results. Thank you for your work.
Hi,@Kyfafyd I have a small question, can PolypDiag downstream tasks handle videos with longer processing times? For example, videos lasting more than 10 minutes?
Hi, @10086ddd Are you performing video-level or frame-level task?
Hi, @Kyfafyd Yes,isn't the PolypDiag downstream task a video level task?
PolypDiag is video-level. I think you can try for videos lasting more than 10 minutes. Increasing the sampling frames for each input video may help to improve the performance. (by adding DATA.NUM_FRAMES 16
in the fine-tuning script)
Ok,Thanks for your answer. @Kyfafyd
Hi,
i was running the experiment for PolypDiag downstream task, when I use the fine-tuned weights as the pretrained_model_weights, each time when I run the test_finetune_polypdiag.sh, it will give me different test result, may I know why this happened? Supposedly, it should be the same test result every time we run the test with the same val 80 test, and same model, right?