Closed YilinLiu97 closed 6 years ago
And also for this one, I'm very confused about exactly which number to assign (1 or 2) in order to use the pre-trained weights. It seems that I should assign "2" but then the comment says "all layers will be initialized with pre-trained weights in case "weight_Initialization" is 1".
weight_Initialization_CNN = 1 weight_Initialization_FCN = 1
weights folderName = /Users/xxx/desktop/LiviaNET/trainedweights/
weights trained indexes = [0,1,2]
Hi @YilinLiu97
To improve the performance one of the first things I would do is to change the network architecture given in the example, since it is actually very shallow (I gave that architecture just to run the example quickly).
For some of the parameters:
"learning rate change Type" will define whether using some schedule to change the learning rate. So far no more options are allowed. The default way for changing the learning rate is defined by the two following parameters(which you can change as you want):
"First Epoch Change LR", in which epoch you want to start to decrease your LR
"Frequency Change LR", each how many epochs the LR must be decreased.
For the weights initialization you are right. I did not update that part when I included the delving option.
There should be more features coming for the network, but since I have many things to do that may take long, sorry for that.
Best,
Jose.
Thanks for the reply!! Btw I have trouble loading the pre-trained weights. It seems to me that there are some tiny typos/bugs in LiviaNet.py, but even when I fixed those, I still got the following:
--- Weights initialization type: Transfer learning...
Traceback (most recent call last):
File "./networkTraining.py", line 82, in
File "/Users/xxx/Downloads/LiviaNET-master/src/LiviaNet/LiviaNet.py", line 256, in generateNetworkLayers
dropoutRate
File "/Users/xxx/Downloads/LiviaNET-master/src/LiviaNet/LiviaNet3DConvLayer.py", line 138, in init
(convolvedOutput_Train, convolvedOutputShape_Train) = convolveWithKernel(self.W, filterShape, inputToConvTrain, inputToConvShapeTrain)
File "/Users/xxx/Downloads/LiviaNET-master/src/LiviaNet/Modules/NeuralNetwork/layerOperations.py", line 63, in convolveWithKernel
wReshapedForConv = W.dimshuffle(0,4,1,2,3)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/tensor/var.py", line 355, in dimshuffle
pattern)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/theano/tensor/elemwise.py", line 159, in init
(i, j, len(input_broadcastable)))
ValueError: new_order[1] is 4, but the input only has 1 axes.
Can you please tell me what you modified in the code that according to you are bugs in the code?? I tried the code a lot of times with different configurations before deploying it here to be sure there were no errors(or the less possible). If there are some, it would be nice to be aware of them to correct them. Please include also your config file to check everything is ok. For example if you change the number of layers, or kernel size,you cannot use the weights I provide in the simple example and then when trying to create the net it will complain, as in this example.
Thanks.
Sure. 1) At first, I got this error,
File "/Users/xxx/Desktop/LiviaNET/src/LiviaNet/LiviaNet.py", line 764, in createNetwork
intermediate_ConnectedLayers)
File "/Users/xxx/Desktop/LiviaNET/src/LiviaNet/LiviaNet.py", line 155, in generateNetworkLayers
if len(weightsTrainedIdx) <> len(numberCNNLayers):
NameError: global name 'weightsTrainedIdx' is not defined
What I modified: (in LiviaNet.py, 154:156)
if self.weight_Initialization_CNN == 2: | if len(self.weightsTrainedIdx) <> len(numberCNNLayers): | print(" ... WARNING!!!! Number of indexes specified for trained layers does not correspond with number of conv layers in the created architecture...")
2) But then I got this one:
self.weightsTrainedIdx is [0, 1, 2]
len is 3 (These two lines are what I printed out for debugging)
Traceback (most recent call last):
File "./networkTraining.py", line 82, in
File "/Users/xxx/Downloads/LiviaNET-master/src/LiviaNet/LiviaNet.py", line 157, in generateNetworkLayers
if len(self.weightsTrainedIdx) <> len(numberCNNLayers):
TypeError: object of type 'int' has no len()
I was confused about this error since self.weightsTrainedIdx should be a list instead of an int, just like what I printed out.
3) Lastly, I commented these two lines (154:156, in LiviaNet.py) and then I got the error described above.
I got these errors when I haven't modified the architectures.
############################################################################################################################################ ################################################# CREATION OF THE NETWORK ##################################################### ############################################################################################################################################
############## =================== General Options ================= ################ [General] networkName = liviaTest
folderName = LiviaNet_Test
############## =================== CNN_Architecture ================= ################ [CNN_Architecture] numkernelsperlayer = [10,20,30,100]
kernelshapes = [[3, 3, 3], [3, 3, 3], [3, 3, 3], [1]]
intermediateConnectedLayers = []
pooling_scales = [[1,1,1],[1,1,1],[1,1,1]]
dropout_Rates = [0.25,0.5]
activationType = 2
n_classes = 9
weight_Initialization_CNN = 2 weight_Initialization_FCN = 2
weights folderName = /Users/xxx/Downloads/LiviaNET-master/trainedWeights
weights trained indexes = [0,1,2]
############## =================== Training Options ================= ################ [Training Parameters]
batch_size=5 number Of Epochs = 3 number Of SubEpochs = 2 number of samples at each SubEpoch Train = 1000
learning Rate change Type = 0
sampleSize_Train = [25,25,25] sampleSize_Test = [45,45,45]
costFunction = 0 SoftMax temperature = 1.0
L1 Regularization Constant = 1e-6 L2 Regularization Constant = 1e-4
Leraning Rate = [0.001]
First Epoch Change LR = 1
Frequency Change LR = 2
Momentum Type = 1 Momentum Value = 0.6
momentumNormalized = 1
Optimizer Type = 1
Rho RMSProp = 0.9 Epsilon RMSProp = 1e-4
applyBatchNormalization = 1 BatchNormEpochs = 20
applyPadding = 1
############################################################################################################################################ ################################################# TRAINING VALUES ##################################################### ############################################################################################################################################
[Training Images] imagesFolder = /Users/xxx/Downloads/LiviaNET-master/Dataset/MR/ GroundTruthFolder = /Users/xxx/Downloads/LiviaNET-master/Dataset/Label/
ROIFolder = /Users/xxx/Downloads/LiviaNET-master/Dataset/ROI/
imageTypes = 1
indexesForTraining = [0,1,2,3,4] indexesForValidation = [5]
Ok, thanks for those comments. I'll check it asap and come you back.
Best,
Hi @YilinLiu97
I was able to reproduce your error.
First of all, thank you for your comments, particularly for the ones of the small typos in the indexes of the weights. Actually it was during cleaning the code that I kept some parts I shouldn't. Here are the problems you found: 1 - You do not need to comment the lines 154-156 anymore. It is now updated to len(self.weightsTrainedIdx) <> numberCNNLayers. NumberCNNLayers must be an integer representing the number of CNNs defined in the network. Therefore having len(NumberCNNLayers) does not make sense. This has been updated in the code.
2- For the problem you get when loading the trained weights, it comes from the fact that I only provided weights for the convolutional layers, while you are trying to initialize also the weights of the fully connected layers with pre-trained weights, which do not exist. I just included the feature of using pre-trained weights in the convolutional layers. Just change in your config file weight_Initialization_FCN = 1. With this value equal to 2 it also crashed for me because of this. With this value to 1 it works (should work) like a charm now.
Best,
It works perfectly now!! Thanks so much!
When I elongated the network to 9 conv layers + 3 FC layers, as mentioned in your paper, I got 'nan' as the loss after a few rounds, even after I have lowered the learning rate a few order of magnitudes from 0.001. I'm wondering, have you encountered this and if so how did you solve it without using tricks like skip connection?
For this problem, my guess is that directly use your pre-trained weights (even just for the first 3 conv layers) and just train whatever layers left - freezing a few layers seems to be a common way for transfer learning. But as you said, the setup won't allow as we have to choose either 1 or 2 (weight initialization) for all conv layers. Is there any workarounds for this? Thanks!
Update: I was able to use the pre-trained weights for the first 3 conv layers and initialize the conv layers left with Delving by modifying a bit in the LiviaNet.py.
But got the following error:
** CREATING NETWORK ** --- Creating model (Reading parameters...) ** Starting creation model ** ------------------------ General ------------------------
Shape of input subvolume (Testing): (5, 1, 45, 45, 45)
... WARNING!!!! Number of indexes specified for trained layers does not correspond with number of conv layers in the created architecture...
--- [STATUS] --------- Creating layer 0 ---------
--- Activation function: PReLU
--- Weights initialization type: Transfer learning...
--- [STATUS] --------- Creating layer 1 ---------
--- Activation function: PReLU
--- Weights initialization type: Transfer learning...
--- [STATUS] --------- Creating layer 2 ---------
--- Activation function: PReLU
--- Weights initialization type: Transfer learning...
--- [STATUS] --------- Creating layer 3 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: (5, 1, 27, 27, 27) ---> Output shape: [5, 50, 25, 25, 25] || kernel shape [50, 1, 3, 3, 3]
----- (Testing) Input shape: (5, 1, 45, 45, 45) ---> Output shape: [5, 50, 43, 43, 43]
--- [STATUS] --------- Creating layer 4 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 50, 25, 25, 25] ---> Output shape: [5, 50, 23, 23, 23] || kernel shape [50, 50, 3, 3, 3]
----- (Testing) Input shape: [5, 50, 43, 43, 43] ---> Output shape: [5, 50, 41, 41, 41]
--- [STATUS] --------- Creating layer 5 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 50, 23, 23, 23] ---> Output shape: [5, 50, 21, 21, 21] || kernel shape [50, 50, 3, 3, 3]
----- (Testing) Input shape: [5, 50, 41, 41, 41] ---> Output shape: [5, 50, 39, 39, 39]
--- [STATUS] --------- Creating layer 6 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 50, 21, 21, 21] ---> Output shape: [5, 75, 19, 19, 19] || kernel shape [75, 50, 3, 3, 3]
----- (Testing) Input shape: [5, 50, 39, 39, 39] ---> Output shape: [5, 75, 37, 37, 37]
--- [STATUS] --------- Creating layer 7 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 75, 19, 19, 19] ---> Output shape: [5, 75, 17, 17, 17] || kernel shape [75, 75, 3, 3, 3]
----- (Testing) Input shape: [5, 75, 37, 37, 37] ---> Output shape: [5, 75, 35, 35, 35]
--- [STATUS] --------- Creating layer 8 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 75, 17, 17, 17] ---> Output shape: [5, 75, 15, 15, 15] || kernel shape [75, 75, 3, 3, 3]
----- (Testing) Input shape: [5, 75, 35, 35, 35] ---> Output shape: [5, 75, 33, 33, 33]
--- Starting to create the fully connected layers....
--- [STATUS] --------- Creating layer 9 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 75, 15, 15, 15] ---> Output shape: [5, 100, 15, 15, 15] || kernel shape [100, 75, 1, 1, 1]
----- (Testing) Input shape: [5, 75, 33, 33, 33] ---> Output shape: [5, 100, 33, 33, 33]
--- [STATUS] --------- Creating layer 10 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 100, 15, 15, 15] ---> Output shape: [5, 100, 15, 15, 15] || kernel shape [100, 100, 1, 1, 1]
----- (Testing) Input shape: [5, 100, 33, 33, 33] ---> Output shape: [5, 100, 33, 33, 33]
--- [STATUS] --------- Creating layer 11 ---------
--- Activation function: PReLU
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 100, 15, 15, 15] ---> Output shape: [5, 100, 15, 15, 15] || kernel shape [100, 100, 1, 1, 1]
----- (Testing) Input shape: [5, 100, 33, 33, 33] ---> Output shape: [5, 100, 33, 33, 33]
----- (Classification layer) kernel shape [9, 100, 1, 1, 1]
--- [STATUS] --------- Creating layer 11 ---------
--- Activation function: Linear
--- Weights initialization type: Delving
----- (Training) Input shape: [5, 100, 15, 15, 15] ---> Output shape: [5, 9, 15, 15, 15] || kernel shape [9, 100, 1, 1, 1]
----- (Testing) Input shape: [5, 100, 33, 33, 33] ---> Output shape: [5, 9, 33, 33, 33]
------- Initializing network training parameters...........
--- Optimizer: Stochastic gradient descent (SGD)
----------------- Starting compilation process -----------------
--- Cost function: negativeLogLikelihood
--- Optimizer: Stochastic gradient descent (SGD)
Traceback (most recent call last):
File "./networkTraining.py", line 82, in
Backtrace when that variable is created:
File "./networkTraining.py", line 82, in
Hi @YilinLiu97
If you want to use pre-trained weights for the 9 conv layers, I suggest you to use that architecture with the demo data, save those weights and use them as pre-trained ones for your data. However, I am sure the NaN problem comes from your data and this will not solve it. I would inspect the content of both the MRI and GT samples that are sent to the trainer, since I guess there should be something wrong there.
Further, I am sorry, but if you modify the code I cannot guarantee it will work. Unless you are very comfortable with Theano, I wouldn't recommend to change things, because it is difficult to debug. This error seems to me that can come from the fact that you missed to connect some of the conv layers (either the weights or bias, or maybe both).
Best
Hi @josedolz , thanks for the reply! Actually I used that architecture with the demo data and still got the nan error. I also observed that the option of resuming training from epochX seems not working for me - it always gives me nan.
** CREATING NETWORK ** --- Creating model (Reading parameters...) ** Starting creation model ** ------------------------ General ------------------------
... Loading model from /Users/xxx/Downloads/LiviaNET_Test/src/outputFiles/DeepLiviaNET_MICCAI_Test/Networks/deep_LiviaNET_Epoch0 ... Network architecture successfully loaded.... ============== EPOCH: 1/3 ================= --- SubEPOCH: 1/2 ... Get samples for subEpoch... ... getting 200 samples per subject... ...Processing subject: 1. 20.0 % of the whole training set... ...Processing subject: 2. 40.0 % of the whole training set... ...Processing subject: 3. 60.0 % of the whole training set... ...Processing subject: 4. 80.0 % of the whole training set... ...Processing subject: 5. 100.0 % of the whole training set... ---------- Cost of this subEpoch: nan Thanks!
I forgot to mention that for this one, I didn't modify anything in the source codes (For testing, I re-downloaded your codes). I really hope to get the "deeper" network working...
Here is the config file:
############################################################################################################################################ ################################################# CREATION OF THE NETWORK ##################################################### ############################################################################################################################################
############## =================== General Options ================= ################ [General] networkName = deep_LiviaNET
folderName = DeepLiviaNET_MICCAI_Test
############## =================== CNN_Architecture ================= ################ [CNN_Architecture] numkernelsperlayer = [25,25,25,50,50,50,75,75,75,100,100,100]
kernelshapes = [[3, 3, 3],[3,3,3],[3,3,3],[3,3,3],[3,3,3],[3,3,3],[3,3,3], [3, 3, 3], [3, 3, 3], [1],[1],[1]]
intermediateConnectedLayers = []
pooling_scales = [[1,1,1],[1,1,1],[1,1,1]]
dropout_Rates = [0.25,0.5]
activationType = 2
n_classes = 9
weight_Initialization_CNN = 1 weight_Initialization_FCN = 1
weights folderName = /Users/xxx/desktop/LiviaNET/trainedweights/
weights trained indexes = [0,1,2]
############## =================== Training Options ================= ################ [Training Parameters]
batch_size=5 number Of Epochs = 3 number Of SubEpochs = 2 number of samples at each SubEpoch Train = 1000
learning Rate change Type = 0
sampleSize_Train = [25,25,25] sampleSize_Test = [45,45,45]
costFunction = 0 SoftMax temperature = 1.0
L1 Regularization Constant = 1e-6 L2 Regularization Constant = 1e-4
Leraning Rate = [0.00001]
First Epoch Change LR = 1
Frequency Change LR = 2
Momentum Type = 1 Momentum Value = 0.6
momentumNormalized = 1
Optimizer Type = 0
Rho RMSProp = 0.9 Epsilon RMSProp = 1e-4
applyBatchNormalization = 1 BatchNormEpochs = 20
applyPadding = 1
############################################################################################################################################ ################################################# TRAINING VALUES ##################################################### ############################################################################################################################################
[Training Images] imagesFolder = /Users/xxx/desktop/LiviaNET/Dataset/MR/ GroundTruthFolder = /Users/xxx/desktop/LiviaNET/Dataset/Label/
ROIFolder = /Users/xxx/desktop/LiviaNET/Dataset/ROI/
imageTypes = 1
indexesForTraining = [0,1,2,3,4] indexesForValidation = [5]
Some updates...
============== EPOCH: 1/3 ================= --- SubEPOCH: 1/2 ... Get samples for subEpoch... ... getting 200 samples per subject... ...Processing subject: 1. 20.0 % of the whole training set... ...Processing subject: 2. 40.0 % of the whole training set... ...Processing subject: 3. 60.0 % of the whole training set... ...Processing subject: 4. 80.0 % of the whole training set... ...Processing subject: 5. 100.0 % of the whole training set... ---------- Cost of this subEpoch: 2.76706286669 --- SubEPOCH: 2/2 ... Get samples for subEpoch... ... getting 200 samples per subject... ...Processing subject: 1. 20.0 % of the whole training set... ...Processing subject: 2. 40.0 % of the whole training set... ...Processing subject: 3. 60.0 % of the whole training set... ...Processing subject: 4. 80.0 % of the whole training set... ...Processing subject: 5. 100.0 % of the whole training set... ---------- Cost of this subEpoch: 2.76206525445 ---------- Training on Epoch #0 finished ---------- ---------- Cost of Epoch: 2.76456406057 / Mean training error 2.76456406057
** Starting validation ** ------------- Segmenting subject: MR_Img6.mat ....total: 1/1... ------------- ... Saving segmentation result... ... Image succesfully saved... ... Saving prob map for class 1... ... Image succesfully saved... ... Saving prob map for class 2... ... Image succesfully saved... ... Saving prob map for class 3... ... Image succesfully saved... ... Saving prob map for class 4... ... Image succesfully saved... ... Saving prob map for class 5... ... Image succesfully saved... ... Saving prob map for class 6... ... Image succesfully saved... ... Saving prob map for class 7... ... Image succesfully saved... ... Saving prob map for class 8... ... Image succesfully saved... ... Computing Dice scores: -------------- DSC (Class 1) : 0.0100566487912 -------------- DSC (Class 2) : 0.00435581907617 -------------- DSC (Class 3) : 0.00441681217125 -------------- DSC (Class 4) : 0.00016433422003 -------------- DSC (Class 5) : 0.00606102799949 -------------- DSC (Class 6) : 0.00599444724886 -------------- DSC (Class 7) : 0.00960786097716 -------------- DSC (Class 8) : 0.00318045358618 ** Validation DONE ** Network model saved in /Users/xxx/Downloads/LiviaNET_Test/src/outputFiles/DeepLiviaNET_MICCAI_Test/Networks as deep_LiviaNET_Epoch1 ============== EPOCH: 2/3 ================= --- SubEPOCH: 1/2 ... Get samples for subEpoch... ... getting 200 samples per subject... ...Processing subject: 1. 20.0 % of the whole training set... ...Processing subject: 2. 40.0 % of the whole training set... ...Processing subject: 3. 60.0 % of the whole training set... ...Processing subject: 4. 80.0 % of the whole training set... ...Processing subject: 5. 100.0 % of the whole training set... ---------- Cost of this subEpoch: 2.75880724072 --- SubEPOCH: 2/2 ... Get samples for subEpoch... ... getting 200 samples per subject... ...Processing subject: 1. 20.0 % of the whole training set... ...Processing subject: 2. 40.0 % of the whole training set... ...Processing subject: 3. 60.0 % of the whole training set... ...Processing subject: 4. 80.0 % of the whole training set... ...Processing subject: 5. 100.0 % of the whole training set... ---------- Cost of this subEpoch: nan ---------- Training on Epoch #1 finished ---------- ---------- Cost of Epoch: nan / Mean training error nan
** Starting validation ** ------------- Segmenting subject: MR_Img6.mat ....total: 1/1... -------------
hi I have the same problem as this,could you help me with the config file? thanks
Hi @mitrasafari
Please have a look to #10
I added the config file employed in the paper, with which you should be able to reproduce the results, as in #10 .
Let me know whether that solves your problem.
Jose
Hi @josedolz HI Thanks for your reply I use your config file but it just create until layer 11 and after that say (Type Error:<TensorType(float32, scalar)>…) I attach screenshot of my cmd that you can see the exact error I use " THEANO_FLAGS='floatX=float32' python ./networkTraining.py ./LiviaNET_Config.ini 0" for configuration but it say THEANO_FLAGS is not recognize. actually i am not expert in this area would you please explain in more details Thanks so much
Hi @mitrasafari
Can you show me the terminal where it complains about THEANO_FLAGS? This is strange.
Anyway, the problem comes from some incompatibilities between some tensor types and numpy values/arrays. If the Theano configuration is set properly, this will not cause the code to crash. However, it seems this is not your case. I have to work on this, but unfortunately I have not time so far.
Some work-around: -1) With the THEANO_FLAGS the problem should be gone. -2) Run the code in GPU. I have seen this problem only when running on CPU. Further, making the whole training on CPU will take more than one week. -3) Set floatX=float32 in the theanorc file.
Hope this helps.
Best
Hi@josedolz sorry to reply late I attach my cmd when i use "THEANO_FLAGS='floatX=float32' python ./networkTraining.py ./LiviaNET_Config.ini 0" I'm pretty sure the problem is my THEANO and i try different type of installation, but still it is not working
Hi @mitrasafari
This is a problem with your environment, not with the code.
I have found people reporting similar problems here:
https://stackoverflow.com/questions/40157515/command-to-run-theano-on-gpu-windows https://stackoverflow.com/questions/41436068/theano-flags-command-not-found
Basically some of these options tell you to run this instead:
set THEANO_FLAGS="floatX=float32" & python xxxxxxx.py
Hi @josedolz, The training for my dataset works smoothly, but the results need some improvement. So I'm thinking about changing some hyper-parameters. But I am confused about some parts of the config file:
For "learning rate change Type", there seems no choices to choose from, same for the "Cost function values", "First Epoch Change LR", "Frequency Change LR".
TODO. To define some changes in the learning rate
learning Rate change Type = 0
Subvolumes (i.e. samples) sizes.
Validation equal to testing samples
sampleSize_Train = [27,27,27] sampleSize_Test = [45,45,45]
Cost function values
0:
1:
costFunction = 0 SoftMax temperature = 1.0
========= Learning rate ==========
L1 Regularization Constant = 1e-6 L2 Regularization Constant = 1e-4
TO check
The array size has to be equal to the total number of layers (i.e. CNNs + FCs + Classification layer)
Leraning Rate = [0.0001, 0.0001, 0.0001, 0.0001,0.0001, 0.0001, 0.0001, 0.0001,0.0001, 0.0001, 0.0001, 0.0001,0.0001, 0.0001 ]
Leraning Rate = [0.001]
First epoch to change learning rate
First Epoch Change LR = 1
Each how many epochs change learning rate
Frequency Change LR = 2
TODO. Add learning rate for each layer
Thanks!