Closed bhatrana closed 1 year ago
Hi @bhatrana , can you please share your config?
Re: gpu and training speeds, if you trained on Bender, it could be that it has slow filesystem access (maybe its a different issue). I was also able to leverage a faster GPU (A100 64GB) which made training a lot faster for me.
Hi @bhatrana, How did you get the DSC value? I also used the same command for validation, but I am not able to see any logging of the DSC value for the OASIS dataset.
@vigsivan : . It seems, I made some change in https://github.com/vigsivan/RWCNet/blob/d90b48f7377b0b69dd655c625772be0328bf166b/networks.py#L354 to resolutions: List[int]=[4,4] due to ongoing cuda error in compute Canada, but later on forgot to put default one while training in bender.
I am training again the network with default value, will update the result by next week.
Is GPU (A100 64GB) allocation based in compute Canada ?
Thanks. !!
@animesh-007 : This repo is based on Learn2Reg , https://learn2reg.grand-challenge.org/ . You can do evaluation/test using eval.py which allows to store displacement field of each pair.
To quantify the results using several metrices, you can use the following repo which is also based on L2R.
https://github.com/MDL-UzL/L2R/tree/main/evaluation
Thanks
Thanks @bhatrana. The link you shared was very helpful. I was able to calculate metrics offline. But the evaluation script doesn’t give the Dice
metric as reported in the RWCNet paper. Can you guide me on how I can calculate the Dice
metric?
Please see the weights in the train_oasis
folder regarding reproducibility of this issue. A100s are available on Narval.
I'm closing this issue, but feel free to follow-up if you come across anything else
Hi Vignesh,
@vigsivan @mrhardisty
Thank you for this nice work to solve the registration problem in medical imaging. Basically, I was trying to retrain this model for Oasis data to reproduce the result. The steps in each stages are set based on the paper.
I have two questions.
1) I used NVIDIA TITAN RTX with 24 GB RAM , which took around 4 weeks to train the whole model. How you managed to train on NVIDIA A100 - 32 GB in around 30 hours ? Can you please share the configuration/request hardware allocations ?
2) After 4 weeks of training, I was not able to get the expected DSC value quoted in the paper.
Here are some of the tensor board graph and validation results.
Please feel free to write so as to find and avoid possible errors, I am making in this training and validation.
FYI,
The used training syntax for this repo is :
python l2r_train_eval.py /home/rranabhat/OASIS/OASIS_dataset.json train_config.json
and for validation : python eval.py oasis_files_paper/checkpoints/data.json eval_config.json
Later on all metrices were measured based on the Lear2Reg evaluation repo.
Thank you.
Best regards Raj