wndcharm train/test splits aren't random enough

GoogleCodeExporter commented 9 years ago


N.B.: I was using wndchrm test, and not wndchrm classify, the latter of which 
is not random.

When running a wndcharm test on a .fit that contains classes that do not have 
the same number of images, the distribution of images used for testing is 
almost completely skewed toward those images that are listed first.

I discovered this issue using the ks_density_generator script, which aggregates 
statistics for individual images across splits.

This issue goes away when using a .fit that has been built with a balanced 
number of images in each class. Also, the skewed distribution of test images 
happens only in those classes which has more than the "max balanced training" 
number of images -- A class which has exactly the "max balanced training" 
number of images in it will be correctly randomly sampled for train/test splits.

I have uploaded two fit files to use as positive and negative controls:
----------
Summary of 'positive_control.fit' (93 samples total, 1 samples per image):
'Class label' (interpreted value) number of samples.
'1_young'   (1) 33
'2_aged_unimpaired' (2) 27
'3_aged_impaired'   (3) 33
----------
Max balanced training: 27
samples per image=1, training images: 26, testing images 1

----------
Summary of 'negative_control.fit' (81 samples total, 1 samples per image):
'Class label' (interpreted value) number of samples.
'1_young'   (1) 27
'2_aged_unimpaired' (2) 27
'3_aged_impaired'   (3) 27
----------
Max balanced training: 27
samples per image=1, training images: 26, testing images 1

To reproduce: 

1. Download positive_control.fit.

2. Run `wndchrm test -i26 -j1 -n100 positive_control.fit positive_control.html

3. Run the ks_density graph generator (located in your wnd-chrm repository 
tree) with the following parameters:
`wnd-charm.googlecode.com/wnd-charm/utilities/ks_density_plot/create_probability
_density_points_from_html.pl --num_classes 3 --verbose 1 positive_control.html`

4. The script will run for about 20 seconds, digesting the html and aggregating 
statistics for individual images.

5. One it spits out the output, scroll up and review. Notice for classes 1 and 
3 with the 33 images, the first image will be used a disproportionate number of 
times, the next in line the same but slightly less, decreasing linearly until 
you get halfway through the list until the last ones aren't used at all.

5. Notice for class 2 the test image for each split is properly chosen random 
across all possible images.

6. Notice in the "REPORT" section of the ks_density script output that only 13 
and 15 images are chosen randomly for testing from classes 1 and 3 respectively 
out of a pool of 33 images for each over 100 train/test splits, where as almost 
all the images in class 2 (if not all) have been sampled. Here are my outputs:

Unbalanced classes (positive_control.fit)
***********REPORT********

Class 1_young: count=  15, min=1.3533, max=2.5900, mean=1.9088, std dev=0.3561, 
bandwidth=0.212024286883305
Class 2_aged_unimpaired: count=  26, min=1.3880, max=2.6950, mean=2.0466, std 
dev=0.2720, bandwidth=0.147237945050311
Class 3_aged_impaired: count=  13, min=1.3967, max=2.3800, mean=2.0222, std 
dev=0.2608, bandwidth=0.158912192153817

Balanced classes (negative_control.fit)

***********REPORT********

Class 1_young: count=  25, min=1.4640, max=2.6450, mean=1.9799, std dev=0.3376, 
bandwidth=0.184023658175067
Class 2_aged_unimpaired: count=  27, min=1.1925, max=2.4600, mean=2.0336, std 
dev=0.2687, bandwidth=0.144494416779609
Class 3_aged_impaired: count=  26, min=1.5000, max=2.4100, mean=2.0113, std 
dev=0.2045, bandwidth=0.110690098775433

Original issue reported on code.google.com by christop...@nih.gov on 15 Jul 2011 at 3:09

Attachments:

GoogleCodeExporter commented 9 years ago

I reproduced the error using one of John's datasets:

001vs002_Ascl-D_VS_Cdx2-D
3461 Images.
Class   Value   Images
Ascl+D  0   701
Ascl-D  0   624
Cdx2+D  0   1100
Cdx2-D  0   1036

He ran the command: `wndchrm test -i500 -j124 -f0.05 -n10 etc., etc., etc.,`

Here's what the cross-split test image statistics are according to the 
ks_density script (min, max, etc here are undefined since there's no 
interpolated values in his dataset)

***********REPORT********

Class Ascl+D: count= 436, min=-1.0000, max=-1.0000, mean=-1.0000, std 
dev=0.0000, 
Class Ascl-D: count= 552, min=-1.0000, max=-1.0000, mean=-1.0000, std 
dev=0.0000, 
Class Cdx2+D: count= 238, min=-1.0000, max=-1.0000, mean=-1.0000, std 
dev=0.0000, 
Class Cdx2-D: count= 253, min=-1.0000, max=-1.0000, mean=-1.0000, std 
dev=0.0000, 

So we observe that the degree to which the pool of images is randomly sampled 
is inversely proportional to the amount of images in each class, which is 
definitely a bug.

I made a slight change to the ks_density script so you could ask for the 
cross-split statistics only and skip the ks_density graphs generation (use 
--stats_only=1), so svn up before you proceed. I have enclosed John's html that 
I used to reproduce the error. I used this command:

wnd-charm.googlecode.com/wnd-charm/utilities/ks_density_plot/create_probability_
density_points_from_html.pl --num_classes 4 --stats_only 1 
001vs002_Ascl-D_VS_Cdx2-D.html

Original comment by christop...@nih.gov on 15 Jul 2011 at 8:42

Attachments:

001vs002_Ascl-D_VS_Cdx2-D.html

GoogleCodeExporter commented 9 years ago

Clearly the 'negative control' is less bad. 

But the bug is there too-
for 100 splits, 3 classes, 27 images per class it seems like you'd get all of 
the images tested.

John's command:
wndchrm test -i500 -j124 -f0.05 -n10 etc. etc.

should be balanced(!)

Original comment by dmarkeck...@gmail.com on 20 Jul 2011 at 3:51

GoogleCodeExporter commented 9 years ago

Fixed in svn rev 248, will tag and release as wndchrm-1.31 shortly

Original comment by christop...@nih.gov on 20 Jul 2011 at 4:09

Changed state: Fixed

kzwkt / wnd-charm

wndcharm train/test splits aren't random enough #32