Closed JunMa11 closed 4 years ago
Hi Jun,
Thanks for your interest in our work. Actually, the results of each model are diverse due to the small number of data in the medical image domain. Most of the results in the paper are the average results of three runs.
Furthermore, we also find the UAMT-UN is more robust (stable) than UAMT. So I would suggest you to add the consistency loss on unlabeled data only and then to see if we can obtain further improvement by adding consistency loss on labeled data.
Hi @yulequan ,
Thanks for your reply very much.
It's interesting, I train the UAMT_unlabel
model twice on the same GPU server. However, the test results are exactly the same.
Are the results of each model only diverse on different GPU servers?
Best, Jun
I have fixed the random seed in the code. So if you run the experiment on the same machine, the results should be similar. I am not sure if we can get the same results on different machine and environments even with the same random seed.
Got it. Thanks for your reply very much.
Dear @yulequan ,
Thanks for sharing the great code. It's very clear and out-of-the-box.
Question on "dynamic" results
My friend and I run the code (without any modification) and get the following results. The results are a little diverse. Some metrics can be reproduced, some metrics (red) are even better than the paper reported results, but some metrics (blue) are degraded.
Could you share your insights on these diverse results? and what could be the possible reason for the degraded results?
A minor bug
Here, the case folder name is missed, so all the saved results have the same name and will be overwritten during saving.
https://github.com/yulequan/UA-MT/blob/da31df5991fc51e8e28ac0c70bbcb5fc514cbe3e/code/test_util.py#L28-L31
Finally, I really appreciate that you make the code publicly available. The code is well written, it would be great learning materials for me.
Looking forward to your reply. Best, Jun