Question on the difference in results after reproduction

ujn7843 commented 3 years ago

Hi Xiao!

Thank you for your sharing.

I have reproduced the results using the code you shared. The environment is as followed:

Python 3.7.11
Pytorch 1.7.1
Scipy 1.7.1
Cuda toolkit 10.1.168 hardware support: NVidia 2080 Ti GPU

I got the Dice results and standard deviations on MNMS dataset, using 100% data from source domain. From my reproduction, however, the scores are consistently lower than the results given in the paper: B,C,D ---> A: 81.85(7.0) compared to 83.21(7.4) in the paper; A,C,D ---> B: 81.49(8.2) compared to 86.53(5.3) in the paper; A,B,D ---> C: 83.37(7.0) compared to 87.22(6.1) in the paper.

Could you possibly know what may be the reason behind the differences? Thanks in advance!

xxxliu95 commented 3 years ago

Hi Jiayi,

Thank you for your interest in our work.

The current code has the parameters without any fine-tuning. As in my experiments, I had to tune several hyperparameters to obtain better results for each case and I only saved the pre-trained models but did not save the hyperparameters for each case.

I can provide several tips here. You can change the resampling rate in the data loader for training and testing by varying 1.1 in "resize_order = re / 1.1" between 1.0 - 1.3. I also performed experiments with input sizes 224x224 and 288x288. Sometimes 288x288 gives better results but the GPU memory will cause issues. I also tried to tune the meta_step_size from 0.001 to 0.01. Also, # k_un = 1 # k1 = 20 # k2 = 2, the three training parameters matter.

I will try to find tuned codes at least for BCD->A such that you can play with it.

Best wishes, Xiao

ujn7843 commented 3 years ago

Hi Xiao,

Thanks for your reminder!

You may kindly send the tuning code to my email: jyangco@connect.ust.hk or just make it public on hub If you can find it.

Jiayi

xxxliu95 commented 3 years ago

Hi Jiayi,

I just started re-training. I will release the tuned version for BCD->A soon.

Feel free to talk with me at MICCAI 2021.

Best wishes, Xiao

ujn7843 commented 3 years ago

Hi Xiao,

Thanks for your effort re-training the model. I attempted to tune the parameters and currently have several questions regarding the implementation details.

1) For changing "resize_order = re / 1.1", you said it was used to change the resampling rate in the dataloader, but after I looked through the code, I am wondering isn't it directly scaling the values of raw data?

2) For changing "# k_un = 1 # k1 = 20 # k2 = 2", I consider k_un to be the number of iterations for meta-train and meta-test steps. I am wondering the naming reasons for 'un' in 'k_un', 'un_imgs', 'un_reco_loss' and etc. Besides, I am wondering how changing k1 and k2 could affect inference results since they are simply used to record the training process in my view.

I hope that you can tell me where I got it wrong. Thanks in advance!

Jiayi

xxxliu95 commented 3 years ago

Hi Jiayi,

The resampling rate is the order to rescale the images, which affects how large the anatomy of interest is.

"un" means unlabeled. k1 and k2 affect the learning rate decay.

Best wishes, Xiao

ujn7843 commented 3 years ago

Hi Xiao,

Thank you for your reminder.

I noticed that you used 'scheduler.step(val_score)' twice in train_meta.py. Is that for accelerating learning rate decay? If it is, why not tune the step_size param inside lr_scheduler.StepLR?

Thanks.

Jiayi

xxxliu95 commented 3 years ago

Hi Jiayi,

Oh yes, good point. I think it is a mistake when I copy from other versions to this public version.

There should be only one step. This will cause the LR to decay too quickly and the model will converge to a local minima quickly.

Best wishes, Xiao

ujn7843 commented 3 years ago

Hi Xiao,

I am currently tuning the model myself according to the tips you gave. May I know the exact way for you to tune the parameters? Did you use grid search, random search, Bayesian optimization or other methods? Is there any priority among the different parameters that need to be tuned first?

Thanks in advance.

Jiayi

xxxliu95 commented 3 years ago

Hi Jiayi,

Tuning the model is a little bit tricky. I tune the hyperparameters by checking the losses and visuals during training. You may first try to change a bit the resampling rate to see how it affects the results and keep k1 and k2 fixed first if you are just playing with it. If you want to consider our model as one baseline, I suggest you wait for the tuned version or I can share you with the well-trained model wights.

I found the results of 5% cases with the current version. It seems that for BCD->A 5% cases, the current parameters work well and the results are even better than the results I report in the paper. I am busy during this week as I am in MICCAI presenting this paper. I will put up more details of training hopefully in next weeks.

Best wishes, Xiao

ujn7843 commented 3 years ago

Great! Thanks.

ujn7843 commented 3 years ago

Hi Xiao,

Kindly note that citation 10 in this paper might be wrong... You might want to cite the paper which proposes Dice loss but actually it isn't ...

Best, Jiayi

xxxliu95 commented 3 years ago

Hi Jiayi,

Thanks for that. I personally prefer to cite the very initial DICE paper. But yes, to be accurate, the following paper should be cited.

Milletari F, Navab N, Ahmadi SA, 2016. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. 2016 Fourth International Conference on 3D Vision, 565–571doi:10.1109/3DV.2016.79.

Best wishes, Xiao

xxxliu95 commented 3 years ago

Hi Jiayi,

I updated the code and training details. I fixed some issues like the double copied lr step code.

The current parameters are for BCD->A cases. You can train the models to see how it performs. Overall, to tune the model, you may want to change the initial learning rate (2e-5 to 5e-5), the number of training epochs (80 to 120 for 100% cases), training parameters (k1 and k2), and resampling rate. Our model has many parameters to tune, which is also a drawback of disentanglement. The results I reported in the paper may not be the potentially best results of our model as I did not have much time to tune before submission. I actually found better results for the 5% case with the parameters in this public version.

Best wishes, Xiao

ujn7843 commented 3 years ago

Hi Xiao,

Thank you very much for your work. I will try to tune the model to see how it performs. Many thanks!

Best, Jiayi

Claydon-Wang commented 3 years ago

Hi Xiao,

Thank you very much for your work. I will try to tune the model to see how it performs. Many thanks!

Best,

Jiayi

Hi, Yang,

I was wondering if the model could be trained in "DataParallel", since it is time-consuming to use a single GPU.

I turn "model.to(device)" to "model = torch.nn.DataParalle(model).cuda()" and the mistake is shown as below:

DataParallel multi-gpu RuntimeError: chunk expects at least a 1-dimensional tensor

How can I solve it. Can you give me some advice? Thanks in advance!

xxxliu95 commented 3 years ago

Hey Claydon,

Given the fact that the batch size I use is just 4, it may not be a good solution to use multiple GPUs to accelerate the training. If I am correct, DataParallel is for these models with a large batch size. And your error might be that on some GPUs there is only one training image.

To accelerate the training, I would suggest you work on accelerating meta-test training, which is the major reason for the slow training of our model. In every meta-test step, we need to create new computation graphs as the meta-test model. This is extremely slow. Without the meta-test training stages, one epoch may only take around 20 mins but with meta-test, it takes 1+ hours.

Best wishes, Xiao

Claydon-Wang commented 3 years ago

Hi liu,

Thank you for your kind reply, I will try it.

Best regards, Wang

xxxliu95 / DGNet

Question on the difference in results after reproduction #8