Training With Google Colab

francescoberloco commented 5 years ago

Hi @tkuanlun350 ,

I'm trying to training the network on Google Colab, with Brats Dataset 2017. But When I try to do the crossvalidation, the virtual machine crashes or the execution stops (the dataset is very large, so he can't load and preprocess it). So I thought of splitting the dataset into several sub-folders and doing the training on each of these. My question is: when I do the training the first time and, once it has finished, I switch the folder of dataset and restart the training, if I choose to keep the log, the model continues to train or it restarts? If restarts, is there any way to split the dataset and train the same model on each sub-folders?

tkuanlun350 commented 5 years ago

You can use --load to load the pre-trained model from different sub-folder. The error you encountered is caused my memory error ? only in evaluation ?

francescoberloco commented 5 years ago

@tkuanlun350 thank you for your answer, I think is a problem caused by virtual machine memory error, because if I reduce the quantity of the training images everything works fine. I find this problem when I try to start the training with cross validation (and also with the prediction when I use 46 images). So I've splitted the dataset in 6 sub-folders and I want train the same model on each sub folder. Can I do this with the option --load? For example, if I train the network on the first subfolder (obtaining the model x) and after I set the option --load./../model-x and I change the dataset path, setting it on the second sub-folder, and I choose to keep the previous log, it starts from the pre-trained model x or it create a new model? Another alternative that I've thought is to change the function TrainConfig with AutoResumeTrainConfig, and increase the number of epoch every time I change the dataset path. For instance in the first training I set 3 epoch, then I save the model, change the dataset path in the scripts and increase the number of epoch to 6. Then I resume the training that it will start from third epoch and it should train with other 3 epochs on the second part of dataset, and so on for the third, fourth, etc. . But I don't know if these two alternatives are equivalents and mostly work properly. Sorry for the long reply.

tkuanlun350 / 3DUnet-Tensorflow-Brats18

Training With Google Colab #27