Augmentation of Data - Githubissues

schelv commented 7 years ago

Hi, The purpose of this issue is to discuss the parameters of the data augmentation and find the best combination. Today @kbasten and I looked at all the settings of ImageDataGenerator and discussed what configuration seemed reasonable to us (we called it baseline). Here I will put some plots of the auc's during training with different settings. We can use those figures to figure out further improvements.

This is the baseline configuration.

[baseline]
rotation_range=360
width_shift_range=0.1
height_shift_range=0.1
shear_range=0.2
zoom_range=0.2
channel_shift_range=2.0
fill_mode=nearest
horizontal_flip=True
vertical_flip=True

The other configurations can be found in augmentation.ini

Old plots without patient level data split

baseline ~0.78 https://cloud.githubusercontent.com/assets/13403863/26149891/62a5fbcc-3afc-11e7-80e0-d14a62cdd3c5.png

channel_shift_10 ~0.80 topscore! https://cloud.githubusercontent.com/assets/13403863/26149867/401853b6-3afc-11e7-8654-f12a6332b244.png

channel_shift_50 ~0.78 https://cloud.githubusercontent.com/assets/13403863/26151127/a6d97602-3b01-11e7-9e18-4ce8ff1f3e89.png

no_rotation ~0.68 https://cloud.githubusercontent.com/assets/13403863/26148096/d9c0c220-3af5-11e7-81e3-537a7712b6f3.png

schelv commented 7 years ago

Important that we consider that these results are to some extent coincidental. It could be that the weight initialization causes some variation in the scores mentioned above. We should think about how we can get more reliable results. Maybe pick the median of multiple runs? use weight decay?

schelv commented 7 years ago

I've removed the plots that were generated without the patient level data split.

jspunda / prostatex

Augmentation of Data #31