Closed jdakka closed 7 years ago
@jdakka Not sure what inputs variable is but assuming that's the input batch:
The np.mean of the training data collectively over all samples and windows would be: np.mean(X_train, axis=(0,1))
Once I got the X_train_mean and X_train_std I used them to normalize the input for each batch:
normalized_inputs=(inputs-X_train_mean)/X_train_std
This should be correct. But remember to normalize the valid/test data the same way.
It still doesn't improve the training loss...
I was looking at the behavior for the sample_nii files and they also show training loss at 0.00. Can you think of other areas in which there may be issues that's causing this issue with training loss?
I thought training loss was very high. Now it's zero from the beginning? Did this happen because of the normalization? If so, check the signal values in the batches, make sure you are not feeding all zeros.
Without normalizing the training function the X_train and batch inputs have the following np.mean()
(Pdb) np.mean(X_train) 2793.4817 (Pdb) np.mean(inputs) 2554.4844
which tells me the signal values for the first batch contains non-zeros.
With normalization I have the following training mean and 1st input batch np.means:
(Pdb) np.mean(X_train) 2793.4817 (Pdb) np.mean(inputs) nan
the inputs become insignificant...so the normalization is causing issues as well. (I tested with and without normalization and the training loss stays at zero)
You can divide by (variance + epsilon) and set epsilon to a small number like 0.001 That should take care of the nan issue. About the loss at zero issue, what happened to the very large loss values? Are you working with actual data or the sample data?
ok I modified the normalized_inputs=inputs-np.mean(X_train)/epsilon+variance.
The training loss is from the actual data: the new behavior shows training loss=0.0 and then spikes to a very high loss, keeps increasing, and then begins to reduce (still high though). See screenshots:
Are you doing this? normalized_inputs=inputs-np.mean(X_train)/epsilon+variance or this normalized_inputs=(inputs-np.mean(X_train))/(epsilon+variance) ? The second one is correct
Sorry for brevity but yes the second one: inputs=(inputs-X_train_mean)/(0.001+X_train_variance)
**it looks like the training keeps fluctuating...first increasing then decreasing (this is still the first batch from first epoch)
Loss is too high. That doesn't make sense. Please push the latest code, I'll look at it On Thu, Mar 23, 2017 at 5:47 PM jdakka notifications@github.com wrote:
Sorry for brevity but yes the second one: inputs=(inputs-X_train_mean)/(0.001+X_train_variance)
**it looks like the training keeps fluctuating...first increasing then decreasing (this is still the first batch from first epoch)
[image: screen shot 2017-03-23 at 5 47 15 pm] https://cloud.githubusercontent.com/assets/17818114/24271778/d5fad78c-0ff0-11e7-89a5-a19322c31012.png
— You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/3D_fMRI_CNN/issues/4#issuecomment-288870792, or mute the thread https://github.com/notifications/unsubscribe-auth/AL1xVy1LXmd5Pwo-vSw5YeXudItMMBmpks5rouiLgaJpZM4MlnSn .
I pushed the latest
modified the following: labels are changed from -1/+1 to 0/1 learning rate = 0.001 changed from 1 HDF5 dataset to 4 separate HDF5 datasets (subjectIDs, labels, runs, data)
the following changed:
training loss seems normal, apart from occasional jumps between batches: there is an initial spike in training loss from batch 1 to batch 2 for the rest of the training, the loss is decreasing apart from 10 slight incremental jumps:
batch 1: training loss: 0.701464 batch 2: training loss: 0.678240 batch 3: training loss: 0.768012
incremental jumps after a few batch iterations:
training loss: 0.672950 training loss: 0.676626 training loss: 0.675055
Forgot to add: another modification:
inputs and targets are shuffled correctly
When increasing the learning rate back to 0.01 the training loss takes on following values, besides the initial jump (which I will check if it's just the sequence of the data by reshuffling) the training loss is monotonically decreasing
training loss: 0.701464 training loss: 28.431140 training loss: 19.177677 training loss: 14.670910 training loss: 11.874058 training loss: 10.010816 training loss: 8.680361 training loss: 7.681924 training loss: 6.899244 training loss: 6.275711 training loss: 5.780591
Pouya I wanted to check with you on this part of the code:
https://github.com/radical-collaboration/3D_fMRI_CNN/blob/master/cnn_fmri.py#L359-L364
I'm trying to understand the part where train_ids and test_ids are separated as to avoid cross-contamination of subjects between validation and training.
In the first screen shot I show the subjects IDs after they are rescaled to 1-95
In the second screenshot I show the separation of train_ids and test_ids in the fold_pairs arrays using the 10kfold method. It appears that the fold_pairs array has the train_ids i.e. fold_pairs[0] and test_ids i.e. fold_pairs[1] sorted in numerical order. Is this supposed to happen?
Code looks at the subject numbers and compares it with the current high and low limits. For example in the first fold, sub_num should be >0 and <10 (assuming sub_num is from 1-95). Then all indices that pass this condition is added to the test_ids. The rest of examples would be train_ids. It should be simple to check whether the correct ids are included in each fold. It doesn't strike me as unusual to have sorted indices out in this way.
So in that case the train_ids and test_ids ARE unique. I checked in both trainIndices/testIndices as well as fold_pairs.
They should be. We need to make sure that the subject IDs for validation set are different than train. I don't think it is the case now.
The trainIndices and validIndices are also unique: for example in the first fold: validIndices has a range 0-1048 and trainIndices has a range 1049-8008
trainIndices = indices[0][len(indices[1]):] validIndices = indices[0][:len(indices[1])]
Correct. They should be unique. However, we talked about having the train and validation indices not overlapping by subjects. Message me if it's not clear.
In terms of np.mean I assume you’re referring to the batch’s input mean? If that’s the case, unfortunately it drops each voxel significantly, and if I recompute the mean of the normalized_input it becomes extremely small (and does nothing for the training loss). See screenshot.