ramhiser / datamicroarray

A collection of small-sample, high-dimensional microarray data sets to assess machine-learning algorithms and models.
104 stars 42 forks source link

Examine Gravier (2010) data set #16

Closed ramhiser closed 11 years ago

ramhiser commented 11 years ago

Måns Thulin from Uppsala University sent the following email to me:

I am now planning to use the Gravier (2010) data to illustrate a new method in a paper, but was wondering if perhaps some of the patients in the study have been misclassified in your R package. According to the Gravier et al. paper and your description of the data on the wiki, there should be 111 patients labelled "good" and 57 labelled "poor". However, when I import the data into R, I get the following:

summary(gravier$y) good poor 106 62

The numbers of patients (168) and features (2,905) are correct, but there seems to be a problem with the class labels. Have 5 "good" patients been labelled as "poor" or is there in fact a misprint in the Gravier et al. paper? Any insights that you could provide regarding this would be deeply appreciated!

ramhiser commented 11 years ago

As Måns noted, the labels were incorrect. The script was gathering the labels from the incorrect column in the additional_info.txt file.