zubata88 / mdgru

Code for Multi-dimensional Gated Recurrent Units for the Segmentation of Biomedical Data
GNU Lesser General Public License v2.1
21 stars 12 forks source link

Best parameters for MS lesion segmentation #30

Open arthursw opened 4 years ago

arthursw commented 4 years ago

Hi,

I would like to reproduce the results from the paper Automated Segmentation of Multiple Sclerosis Lesions using Multi-Dimensional Gated Recurrent Units but I'm not sure how to set the parameters to get optimal results.

Would the following settings work well enough?

python3 RUN_mdgru.py --datapath path/to/samplestructure --locationtraining train_data \
--locationvalidation val_data \
--optionname onlytrainrun --modelname mdgrudef48 -w 64 64 64 -p 5 5 5 \
-f seq1.nii.gz seq2.nii.gz seq3.nii.gz -m lab.nii.gz --iterations 100000 \
--nclasses 2 --num_threads 8 --only_train --rotate 0.2 --scale 0.8 1.2 --deformation 0 2 --deformSigma 1 1 \
--add_e_bn True --resmdgru True --use_dropconnect_on_state True --dice_loss_label 0 1 --dice_loss_weight 0.8 --dice_autoweighted --dice_cc

I am particularly unsure of the --deformation 0 2 --deformSigma 1 1 and --dice_loss_weight 0.8 settings. The --add_e_bn True --resmdgru True --use_dropconnect_on_state True seem to match the descriptions in the paper.

The README.md says that some changes were done to get the best results, so could I ever reach the same performance without touching the code?

My goal is to benchmark MS lesion segmentation methods (for the French Multiple Sclerosis Registry (OFSEP)), so I would like to know if this method competes with a method like nnUNet. Maybe you have an idea about it?

Thanks a lot for your code and (hopefully) your help :+1:

zubata88 commented 4 years ago

Hi arthursw

I haven't worked with the code for quite a while now, I might therefore only be of limited help.

Concerning the deformation factors, it is important to always sample some results and see, how the parameters affect the samples. Do they still look (almost) realistic? In the end you only want to enhance the training data, not increase the spectrum of images you need to correctly segment by creating unrealistic samples. Concerning rotation and mirroring the same point applies, you only want to mirror the brains along the saggital plane and rotate them, such that naturally feasible deviations occur.

Now you also need to know if you want the best results or if you want to imitate the ones of the paper you mentioned. There is always some tuning necessary, and the best parameters highly depend on the metric(s) and data you use. If you want to immitate the results from said paper, you cannot use dice, as it was not yet implemented back then. As far as I remember, the paper describes the used parameters in detail. If you want the best results however, I would encourage you to use the dice loss, as I suspect it improves results. How exactly you would need to tune these dice loss parameters could best be answered by @antal-horvath .

However, for the general task of benchmarking lesion segmentation methods: Here it would make most sense to either look at different categories (Which loss is best, which data augmentation is best, which network is best) and fix everything else or fix everything but the actual methods you are looking at. This means, you would use the same data augmentation techniques, the same training length, the same parameters wherever possible and only compare the method (usually the neural network) itself, as the other design choices also have a very high impact.

Best,

Simon

arthursw commented 4 years ago

Hi Simon,

Thanks a lot for those tips! I was wondering if the implementation described in the paper is fully available to get something which generalize well, but otherwise I'll try some settings and see how it goes.

Best, Arthur

arthursw commented 4 years ago

Hi @zubata88 and @gtancev,

I trained the method with the following parameters:

python3 $MDGRU/RUN_mdgru.py --datapath $DATA --locationtraining train_data --locationvalidation val_data --locationtesting test_data --only_train --optionname onlytrainrun --modelname mdgrudef48 -w 64 64 64 -p 5 5 5 -f flair.nii.gz t1.nii.gz t2.nii.gz -m label.nii.gz --save_each 2500 --iterations 100000 --nclasses 2 --num_threads 8 --only_train --rotation 0.2 --scaling 1.3 --deform 7 7 7 --deformSigma 3 3 3 --add_x_bn --resmdgru --use_dropconnect_on_state --dice_loss_label 0 1 --dice_loss_weight 0.8 --dice_autoweighted --dice_cc

but I must have done something wrong (I suspect I shouldn't have use --deform 7 7 7 --deformSigma 3 3 3) because I did not work. The return segmentations only contain ones, and the probability distributions look like the following image:

image

Unfortunately, I do not have the resources to precisely follow your advice (fixing everything but the selected methods). Instead, I preprocessed my images once, and use them for the methods I want to compare with the given "optimal" parameters. The metrics I use are the ones from the MICCAI 2016 MS Challenge (dice score, f1 score, Jaccard distance, etc.) and the data is meant to be heterogeneous for generalization. I describe more precisely my objectives in this post.

I see that you developed another method, V-Net, and maybe you are aware of alternatives. So I would like to know if you have any advice on which methods to test (which could be trained given a set of "optimal" parameters for MS segmentation on heterogeneous data). Otherwise I'll continue with the methods I mentioned in my post, no problem of course, thanks for sharing your work anyway!

Thanks, Arthur

gtancev commented 4 years ago

Hi, I did the documentation for MD-GRU last year, e.g., see here. I guess you might have specified deform params wrongly. Hope that works out.

Best, Georgi

arthursw commented 4 years ago

Thank you! Did you compare MDGRU with V-Net?

gtancev commented 4 years ago

I had the intention, but the project was over before I could do that. The V-Net that I forked had no data augmentation and I had to add it first from MD-GRU. But there might be even newer/better algorithms by now (both, V-Net and MD-GRU are from before 2016), considering the progress of deep learning. Maybe @zubata88 knows more, as I am not working in this field.