legumeinfo / ArachisPheno

AraPheno source code for http://arapheno.1001genomes.org
MIT License
0 stars 0 forks source link

Phenotype transformations #23

Open svengato opened 3 years ago

svengato commented 3 years ago

These replace each phenotype value v with a functional transformation, like log(v), sqr(v), sqrt(v).

Some of the resulting histograms look reasonable, like that for seed_weight. Note that the untransformed value with the highest frequency is around 150, and log(150) ~ 5 which has the highest frequency in the log-transformation histogram.
http://dev.lis.ncgr.org:50007/phenotype/20/transformation/

Others, like #1_kernel_weight, seem off. Here, note that the highest value is around 125, log(125) ~ 4.8 but the corresponding value in the log-transformation histogram is about 5.1, and sqr(125) = 15625 but the highest value in the sqr-transformation histogram is about 27000, etc.
http://dev.lis.ncgr.org:50007/phenotype/1/transformation/

On closer examination, AraPheno's default behavior for a transform f is not to use f(v) as expected, but to use
f(v - min(vv) + 0.1*var(vv))
where vv is the list of all values. This must be some kind of statistical correction (? I am still researching it). However, it is possible to tell it to not do this in ArachisPheno.