Closed GoogleCodeExporter closed 9 years ago
Thank you for helping out.
I have attached a "testset" on which I created a .fit file. I am training with
a txt file that contains file names associated with numeric values, so I'm
hoping that in testing, wnd-charm can do some interpolation with the values.
The error I get seems to be that wnd-charm thinks each image is in its own
discrete class (even when I use the -C option).
The testset was placed directly into the wndcharm directory. Everything is
there, except the images themselves (since these are confidential). But if you
need them you reproduce the error, I can email you those images personally.
Thanks again!
Original comment by jimmyst...@gmail.com
on 22 Apr 2011 at 3:52
Attachments:
The -C option doesn't really work (maybe we should remove it from the
documentation?)
The .txt file looks good. It should print out a summary of your dataset - does
that look right?
It will put each image in a separate class by value, then, because each class
name is numeric, it will automatically compute a Pearson correlation between
each image's predicted value and the value in the file.
Because you have only one image per class, you can't really do a standard
classification test. The way to get around this is to provide a separate test
file.
You can test this dataset against itself to see what kind of correlation you
get. Its an almost-equivalent to doing a leave-one-out classification. The
difference is that the image being tested will have contributed to the weights
used in the classifier feature space, but the "collision" will be detected
between the test image and its corresponding training image, and the training
image's contribution to the marginal probability for the test image will be
ignored (because its infinite).
To do this test, just use the same file (the .txt or a .fit made from the .txt
using train) in a wndchrm classify command. This will use all of the images
for training, and all of the (same) images for testing. You still have to
specify two files, so just use the same file twice.
If the results look promising for the self-test, to do a real cross-validation,
you would set up two .txt files with two separate non-overlapping sets of
images. Then use one of the sets as your <train set>, and the other one as
your <test set>. Using the "test" command will let you train and test using
randomly selected subsets of your files (splits, we call them). Using
"classify" will use all of the images in <train set> for training, and all of
the images in <test set> for testing. If you specify image sub-sets (using -i
and/or -j), the sub-sets will be picked randomly by "test", but in file-order
by "classify".
Original comment by i...@cathilya.org
on 22 Apr 2011 at 4:55
Jimmy,
The -C option really should be removed, because in this branch (1.30) there is
nothing "continuous" about how wndcharm generates an interpolated score. As it
currently works, each image with its own value assigned to it is placed into
its own discrete class. If there are more than one image with a certain value,
then that image gets lumped into the class with all the other images of that
value. wndcharm will then calculate Fisher weights, which emphasize
differentiation among image classes for use in a WND (Weighted Neighbor
Distance) classifier. However for a continuous dataset such as yours, Jimmy,
the correct weights to use are Pearson weights. There is code to do this in the
1.30 branch but it is commented out. And even if you uncommented it, you still
would be using it to classify test images against discrete images classes, and
then interpolating a score off the results of the classification. There has
been talk in our group for a long time now about the need for pure
interpolation functionality using some linear regression, and not
classification over discrete classes. We're not there yet, but we're getting
there. The functionality you're looking for will not be added to the 1.30
branch, but soon Ilya will check in the pearson weight functionality into the
trunk, and you'll be able to check out that source and compile it just as easy
as you would downloading a RC tarball. At least that will get you part of the
way there. We'll let you know when that's been done.
Original comment by christop...@nih.gov
on 22 Apr 2011 at 5:30
To clarify, wndchrm 1.30 in its present form will in fact give you
continuous-value predictions for a dataset such as yours. It does it in a
round-about way, which arguably is more unbiased than doing it in a more direct
way with Pearson weights rather than FIscher weights.
Your images will be assigned to discrete classes for training. Your testing
images will be classified into these discrete training classes. Additionally,
because your class labels are numeric, it will interpolate a continuous value
for each test image and report it in the HTML report (along with the Pearson
correlation and P-value of its success in doing so). This technique of
interpolation has given us results that we have shown have underlying molecular
basis through gene expression, as well as other independent imaging assays.
Its probably less sensitive than what you could get with a continuous
classification approach, but it does have the advantage that it is less
"forced" to give you the answer you want. Plus its quite sensitive already.
So don't wait for continuous classification to appear in wndchrm - there's
plenty there right now to explore with interpolating continuous scores for your
images.
Original comment by i...@cathilya.org
on 22 Apr 2011 at 5:53
That makes sense. For now I think I will split my images into discrete classes
for training. But the self-correlation trick during testing sounds like an easy
way to roughly assess how accurate the predictions are.
Thanks for all the help!
Original comment by jimmyst...@gmail.com
on 22 Apr 2011 at 11:27
I'm going to close this issue by commenting out the C option in the help
message.
Original comment by i...@cathilya.org
on 26 Apr 2011 at 5:24
Original issue reported on code.google.com by
christop...@nih.gov
on 21 Apr 2011 at 5:54