szcf-weiya / ESL-CN

The Elements of Statistical Learning (ESL)的中文翻译、代码实现及其习题解答。
https://esl.hohoweiya.xyz
GNU General Public License v3.0
2.4k stars 588 forks source link

Ex. 12.5 #155

Closed szcf-weiya closed 4 years ago

szcf-weiya commented 5 years ago

12-5

szcf-weiya commented 4 years ago

(a)

https://github.com/szcf-weiya/ESL-CN/blob/3a6336b79389e1eb8b3893f3fa51a607b7997d82/code/Ex.12.5/main.jl#L8-L32 all_curves

szcf-weiya commented 4 years ago

(b)

PNG image

szcf-weiya commented 4 years ago

(c)

I think the reason for putting all phonemes of a speaker into either train set or test set is to keep the test set independent from the train set, otherwise, the final accuracy tends to be higher since we have known the information of the speaker through some partial phonemes in the train set before performing testing on the test set. https://github.com/szcf-weiya/ESL-CN/blob/3a6336b79389e1eb8b3893f3fa51a607b7997d82/code/Ex.12.5/main.jl#L131-L156 The accuracy is as follows:

julia> accs
3×4 Array{Float64,2}:
 0.811805  0.835757  0.851155  0.842601
 0.867408  0.875962  0.88195   0.890505
 0.867408  0.886228  0.885372  0.893926

and comparison between some contingency tables,

J = 5, K = 1
5×5 Named Array{Int64,2}
Dim1 ╲ Dim2 │  aa   ao  dcl   iy   sh
────────────┼────────────────────────
aa          │ 112   64    0    0    0
ao          │  68  186    5    4    0
dcl         │   0    4  185    6    0
iy          │  11   39    4  244   13
sh          │   0    0    0    2  222

J = 15, K = 7
5×5 Named Array{Int64,2}
Dim1 ╲ Dim2 │  aa   ao  dcl   iy   sh
────────────┼────────────────────────
aa          │ 130   46    0    0    0
ao          │  64  198    1    0    0
dcl         │   0    0  192    3    0
iy          │   1    0    8  301    1
sh          │   0    0    0    0  224

And I also plot the (smooth) prototypes, all_B_curves which can be treated as the extracted features from the original data.