yulequan / UA-MT

code for MICCAI 2019 paper 'Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation'.
https://arxiv.org/abs/1907.07034
477 stars 97 forks source link

Question on dynamic results #2

Closed JunMa11 closed 4 years ago

JunMa11 commented 4 years ago

Dear @yulequan ,

Thanks for sharing the great code. It's very clear and out-of-the-box.

Question on "dynamic" results

My friend and I run the code (without any modification) and get the following results. The results are a little diverse. Some metrics can be reproduced, some metrics (red) are even better than the paper reported results, but some metrics (blue) are degraded.

Could you share your insights on these diverse results? and what could be the possible reason for the degraded results?

We also try to re-run the code on the local server, however, the results are similar.

Results

A minor bug

Here, the case folder name is missed, so all the saved results have the same name and will be overwritten during saving.

https://github.com/yulequan/UA-MT/blob/da31df5991fc51e8e28ac0c70bbcb5fc514cbe3e/code/test_util.py#L28-L31

Finally, I really appreciate that you make the code publicly available. The code is well written, it would be great learning materials for me.

Looking forward to your reply. Best, Jun

yulequan commented 4 years ago

Hi Jun,

Thanks for your interest in our work. Actually, the results of each model are diverse due to the small number of data in the medical image domain. Most of the results in the paper are the average results of three runs.

Furthermore, we also find the UAMT-UN is more robust (stable) than UAMT. So I would suggest you to add the consistency loss on unlabeled data only and then to see if we can obtain further improvement by adding consistency loss on labeled data.

JunMa11 commented 4 years ago

Hi @yulequan ,

Thanks for your reply very much. It's interesting, I train the UAMT_unlabel model twice on the same GPU server. However, the test results are exactly the same.

1st training

image

2nd training

image

Are the results of each model only diverse on different GPU servers?

Best, Jun

yulequan commented 4 years ago

I have fixed the random seed in the code. So if you run the experiment on the same machine, the results should be similar. I am not sure if we can get the same results on different machine and environments even with the same random seed.

JunMa11 commented 4 years ago

Got it. Thanks for your reply very much.