radical-collaboration / 3D_fMRI_CNN

8 stars 7 forks source link

normalizing voxels in training #4

Closed jdakka closed 7 years ago

jdakka commented 7 years ago

np mean input

In terms of np.mean I assume you’re referring to the batch’s input mean? If that’s the case, unfortunately it drops each voxel significantly, and if I recompute the mean of the normalized_input it becomes extremely small (and does nothing for the training loss). See screenshot.

pbashivan commented 7 years ago

@jdakka Not sure what inputs variable is but assuming that's the input batch:

  1. You should compute the mean value per pixel. Using np.mean(inputs) would give you a single mean value for all voxels which is not what we want. Find the mean of each voxel over all samples and time windows. e.g. if data shape is (num_samples, num_windows, 1, W, H, D) find the mean like this: np.mean(inputs, axis=(0,1))
  2. You should be computing the mean and std over the entire training set once. Not over batches.
jdakka commented 7 years ago

The np.mean of the training data collectively over all samples and windows would be: np.mean(X_train, axis=(0,1))

Once I got the X_train_mean and X_train_std I used them to normalize the input for each batch:

normalized_inputs=(inputs-X_train_mean)/X_train_std

pbashivan commented 7 years ago

This should be correct. But remember to normalize the valid/test data the same way.

jdakka commented 7 years ago

screen shot 2017-03-23 at 11 21 15 am

It still doesn't improve the training loss...

I was looking at the behavior for the sample_nii files and they also show training loss at 0.00. Can you think of other areas in which there may be issues that's causing this issue with training loss?

pbashivan commented 7 years ago

I thought training loss was very high. Now it's zero from the beginning? Did this happen because of the normalization? If so, check the signal values in the batches, make sure you are not feeding all zeros.

jdakka commented 7 years ago

Without normalizing the training function the X_train and batch inputs have the following np.mean()

(Pdb) np.mean(X_train) 2793.4817 (Pdb) np.mean(inputs) 2554.4844

which tells me the signal values for the first batch contains non-zeros.

jdakka commented 7 years ago

With normalization I have the following training mean and 1st input batch np.means:

(Pdb) np.mean(X_train) 2793.4817 (Pdb) np.mean(inputs) nan

the inputs become insignificant...so the normalization is causing issues as well. (I tested with and without normalization and the training loss stays at zero)

pbashivan commented 7 years ago

You can divide by (variance + epsilon) and set epsilon to a small number like 0.001 That should take care of the nan issue. About the loss at zero issue, what happened to the very large loss values? Are you working with actual data or the sample data?

jdakka commented 7 years ago

ok I modified the normalized_inputs=inputs-np.mean(X_train)/epsilon+variance.

The training loss is from the actual data: the new behavior shows training loss=0.0 and then spikes to a very high loss, keeps increasing, and then begins to reduce (still high though). See screenshots:

screen shot 2017-03-23 at 5 37 27 pm screen shot 2017-03-23 at 5 37 38 pm

pbashivan commented 7 years ago

Are you doing this? normalized_inputs=inputs-np.mean(X_train)/epsilon+variance or this normalized_inputs=(inputs-np.mean(X_train))/(epsilon+variance) ? The second one is correct

jdakka commented 7 years ago

Sorry for brevity but yes the second one: inputs=(inputs-X_train_mean)/(0.001+X_train_variance)

**it looks like the training keeps fluctuating...first increasing then decreasing (this is still the first batch from first epoch)

screen shot 2017-03-23 at 5 47 15 pm

pbashivan commented 7 years ago

Loss is too high. That doesn't make sense. Please push the latest code, I'll look at it On Thu, Mar 23, 2017 at 5:47 PM jdakka notifications@github.com wrote:

Sorry for brevity but yes the second one: inputs=(inputs-X_train_mean)/(0.001+X_train_variance)

**it looks like the training keeps fluctuating...first increasing then decreasing (this is still the first batch from first epoch)

[image: screen shot 2017-03-23 at 5 47 15 pm] https://cloud.githubusercontent.com/assets/17818114/24271778/d5fad78c-0ff0-11e7-89a5-a19322c31012.png

— You are receiving this because you were assigned.

Reply to this email directly, view it on GitHub https://github.com/radical-collaboration/3D_fMRI_CNN/issues/4#issuecomment-288870792, or mute the thread https://github.com/notifications/unsubscribe-auth/AL1xVy1LXmd5Pwo-vSw5YeXudItMMBmpks5rouiLgaJpZM4MlnSn .

jdakka commented 7 years ago

I pushed the latest

jdakka commented 7 years ago

modified the following: labels are changed from -1/+1 to 0/1 learning rate = 0.001 changed from 1 HDF5 dataset to 4 separate HDF5 datasets (subjectIDs, labels, runs, data)

the following changed:

training loss seems normal, apart from occasional jumps between batches: there is an initial spike in training loss from batch 1 to batch 2 for the rest of the training, the loss is decreasing apart from 10 slight incremental jumps:

batch 1: training loss: 0.701464 batch 2: training loss: 0.678240 batch 3: training loss: 0.768012

incremental jumps after a few batch iterations:

training loss: 0.672950 training loss: 0.676626 training loss: 0.675055

jdakka commented 7 years ago

Forgot to add: another modification:

inputs and targets are shuffled correctly

jdakka commented 7 years ago

When increasing the learning rate back to 0.01 the training loss takes on following values, besides the initial jump (which I will check if it's just the sequence of the data by reshuffling) the training loss is monotonically decreasing

training loss: 0.701464 training loss: 28.431140 training loss: 19.177677 training loss: 14.670910 training loss: 11.874058 training loss: 10.010816 training loss: 8.680361 training loss: 7.681924 training loss: 6.899244 training loss: 6.275711 training loss: 5.780591

jdakka commented 7 years ago

Pouya I wanted to check with you on this part of the code:

https://github.com/radical-collaboration/3D_fMRI_CNN/blob/master/cnn_fmri.py#L359-L364

I'm trying to understand the part where train_ids and test_ids are separated as to avoid cross-contamination of subjects between validation and training.

In the first screen shot I show the subjects IDs after they are rescaled to 1-95

In the second screenshot I show the separation of train_ids and test_ids in the fold_pairs arrays using the 10kfold method. It appears that the fold_pairs array has the train_ids i.e. fold_pairs[0] and test_ids i.e. fold_pairs[1] sorted in numerical order. Is this supposed to happen?

screen shot 2017-04-11 at 8 48 43 am

screen shot 2017-04-11 at 8 50 52 am

pbashivan commented 7 years ago

Code looks at the subject numbers and compares it with the current high and low limits. For example in the first fold, sub_num should be >0 and <10 (assuming sub_num is from 1-95). Then all indices that pass this condition is added to the test_ids. The rest of examples would be train_ids. It should be simple to check whether the correct ids are included in each fold. It doesn't strike me as unusual to have sorted indices out in this way.

jdakka commented 7 years ago

So in that case the train_ids and test_ids ARE unique. I checked in both trainIndices/testIndices as well as fold_pairs.

pbashivan commented 7 years ago

They should be. We need to make sure that the subject IDs for validation set are different than train. I don't think it is the case now.

jdakka commented 7 years ago

The trainIndices and validIndices are also unique: for example in the first fold: validIndices has a range 0-1048 and trainIndices has a range 1049-8008

trainIndices = indices[0][len(indices[1]):] validIndices = indices[0][:len(indices[1])]

pbashivan commented 7 years ago

Correct. They should be unique. However, we talked about having the train and validation indices not overlapping by subjects. Message me if it's not clear.