szcf-weiya / ESL-CN

The Elements of Statistical Learning (ESL)的中文翻译、代码实现及其习题解答。

https://esl.hohoweiya.xyz

GNU General Public License v3.0

2.43k stars 594 forks source link

SRBCT microarray data #204

Open szcf-weiya opened 5 years ago

szcf-weiya commented 5 years ago

SRBCT microarray data

SBRCT gene expression data. 2318 genes, 63 training samples, 25 test samples.

One gene per row, one sample per column

Cancer classes are labelled 1,2,3,4 for c("EWS","RMS","NB","BL")

Files

Training set gene expression: khan.xtrain.txt
Training set class labels: khan.ytrain.txt
Test set gene expression: khan.xtest.txt
Test set class labels : khan.xtest.txt

szcf-weiya commented 5 years ago

diagonal LDA

p652 or ESL CN The original text claims that

Here the diagonal LDA classifier yielded five misclassification errors for the 20 test samples.

As you can see in the above frequency table, there are 7 misclassification errors among 20 test samples (the NA samples are excluded), roughly the same performance.

szcf-weiya commented 5 years ago

Regularized Diagonal LDA (with Delta = 2.0)

Zero training error and only 1 misclassification errors among 20 test samples (exclude NA samples), roughly agree with the top panel of Fig. 18.4

szcf-weiya commented 5 years ago

remove `NA`

szcf-weiya commented 5 years ago

Error curves (Fig. 18.4 top)

error_curves Roughly reproduce the original figure, the cv error might be different since the division of folds.

Tips related to the plot. Cannot find the twiny command in plot.jl, although it does exist a twinx command, which is a bonus-feature and is not described in the docs. https://github.com/JuliaPlots/Plots.jl/issues/337 Then I resorted to the pyplot package.

szcf-weiya / ESL-CN

SRBCT microarray data #204

SRBCT microarray data

Files

diagonal LDA

Regularized Diagonal LDA (with Delta = 2.0)

remove NA

Error curves (Fig. 18.4 top)

remove `NA`