Retraining model with new atlas

armaneshaghi commented 4 years ago

Would it be possible to perform transfer learning and retrain the model with new parcellations?

If so, would be great if you could provide a minimal example.

I would like to use Neuromorphometrics Atlases and believe retraining the model will perform better than training a new one from scratch.

satra commented 4 years ago

for kwyk, we are redoing the model with nobrainer. i think @kaczmarj has already created the variational pieces. also i have the data with the full freesurfer parcellations. we can indeed retrain the existing model with the more extensive parcellation and then tune it further with the neuromorphometrics atlases.

this should not be hard at this point and indeed an example would be good.

kaczmarj commented 4 years ago

hi all, the variational model is now part of nobrainer (https://github.com/neuronets/nobrainer/blob/master/nobrainer/models/bayesian.py). i will work on creating a minimal example (jupyter notebook) of retraining it.

kaczmarj commented 4 years ago

@armaneshaghi - i created a jupyter notebook guide for transfer learning using the kwyk model: https://github.com/neuronets/nobrainer/blob/add/bayesian-transfer/guide/transfer_learning-bayesian.ipynb

you can access this on google colab at https://colab.research.google.com/github/neuronets/nobrainer/blob/add%2Fbayesian-transfer/guide/transfer_learning-bayesian.ipynb

satra commented 4 years ago

@kaczmarj - the instructions are still off. there is no pretrained model in nobrainer-models. how does one find this: https://dl.dropbox.com/s/rojjoio9jyyfejy/nobrainer_spikeslab_32iso_weights.h5

we should do what we had said to turn nobrainer-models into a datalad repo.

satra commented 4 years ago

also had to do: !pip install nobrainer tensorflow-gpu==2.1.0

also getting this error during training:

ValueError: Variable <tf.Variable 'layer1/vwnconv3d/kernel_posterior_loc:0' shape=(3, 3, 3, 1, 96) dtype=float32> has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

kaczmarj commented 4 years ago

thanks @satra - i have to debug the model because i get nan loss sometimes, even with low learning rates. then i will save it to nobrainer-models

cristidonos commented 4 years ago

Hi all. This is a very nice segmentation method, thank you for sharing. I noticed that some blocks have significantly higher segmentation errors (especially at the interface of superior frontal and paracentral areas). Can you please comment on the following:

Increasing the block size from 32 to 64 or even 128 would help with these errors?
What was the training time with ~10000 MRIs and 32x32x32 blocks on the 12Gb GPU?
Is the original training data used in the paper available for retraining the network with different block sizes? (i would very much like to avoid redoing the freesurfer pipeline on hundreds of MRIs as it takes forever). Thanks, Cristian

satra commented 4 years ago

@cristidonos - thanks for the report. we were retraining with two new models and larger block sizes (128^3).

the original data but with the full freesurfer labels (unfortunately some pieces of the original data while accessible to everyone are not redistributable by us. e.g., HCP)
we are planning on retraining on the uk biobank as well.

however, we are presently running into an out of memory error that happens partway through training and is somehow linked to tensorflow probability. it's been a difficult error to track down since it happens not at the initial model allocation but somewhere down the road.

cristidonos commented 4 years ago

Thank you for the quick response, I am looking forward for the new models. Based on your experience with the model, what is a good guess for training parameters (block size, learning rates, epochs, etc) if using a 24GB Titan RTX? Would 500 MRIs (from ABIDE and ADNI) be sufficient to obtain good segmentation results?

satra commented 4 years ago

block size, learning rates, epochs, etc

ideally i'd like to be at 256^3 as that integrates information over the entire brain. regarding learning rates and epochs, the defaults for this model are a good place to start and then you can monitor training and validation error to see if any adjustments could be made.

if using a 24GB Titan RTX?

you would likely not be able to go up to 128^3 (we are barely doing that on our v100s). you can try it and see. the version of code released here in the repo uses tensorflow 1, while the master version of nobrainer uses tensorflow 2.

Would 500 MRIs (from ABIDE and ADNI) be sufficient to obtain good segmentation results?

i believe this would be ok if you start from the existing model and use transfer learning. however this won't let you change block. it will also probably depend on the quality of segmentation you give the model. the reason we want to make these larger block size models on larger datasets available is so that they can be used for transfer learning.

neuronets / kwyk

Retraining model with new atlas #18