rqtl / qtl2

QTL analysis software for high-dimensional data and complex cross designs
https://kbroman.org/qtl2
GNU General Public License v3.0
34 stars 23 forks source link

Extreme values from scan1coef #235

Closed kieranatkins closed 3 months ago

kieranatkins commented 3 months ago

Hi all,

I am running a QTL analysis on some custom phenotype data on the Arabidopsis Thaliana MAGIC kover2009 data, I have taken the genotype data from the atMAGIC (https://github.com/tavareshugo/atMAGIC) library, however a few of my phenotypes return extreme values for the scan1coef (way beyond the max and min in that actual data series)

I have written a small script to reproduce the error, along with a link to a csv file containing the data. Please let me know if this is expected behaviour, or if I have made a mistake in my script.

Thanks.

Code to reproduce error:

remotes::install_github("tavareshugo/qtl2helper")
remotes::install_github("tavareshugo/atMAGIC")

pheno_data <- read.csv("reduced_pheno.csv")
kover2009 <- atMAGIC::kover2009
data <- qtl2helper::add_pheno(kover2009, pheno_data, idcol = "SUBJECT.NAME")

probs <- qtl2::calc_genoprob(data, error_prob = 0.002)

a_probs <- qtl2::genoprob_to_alleleprob(probs)
a_kinship <- qtl2::calc_kinship(a_probs, "loco")
a_scan <- qtl2::scan1(a_probs, data$pheno, a_kinship)
effects <- qtl2::scan1coef(a_probs[, "1"], data$pheno[,"p1"], kinship=a_kinship["1"])

print(max(effects))
print(min(effects))

Which returns

     Bur 
29.77312 
[1] -529.1257

Much higher values than I would expect.

Data: reduced_pheno.csv

kbroman commented 3 months ago

It's pretty hard to avoid this sort of problem, which generally occurs in regions where one of the 19 haplotypes is likely absent.

# which position and haplotype has min estimated effect?
which.min(apply(effects, 1, min))  # position 208
which.min(apply(effects, 2, min))  # haplotype 12 (Po) 

# maximum probability for that haplotype at that position
max(a_probs[[1]][208,12,])         # 0.02

I'm personally not too enthusiastic about the use of scan1coef(). I prefer to focus on the estimated effects at an inferred QTL, rather than looking at the estimated effects across a whole chromosome.

kieranatkins commented 3 months ago

Ah that makes a lot of sense when considering the probabilities at that position. Thanks for clearing that up. I may switch to looking at effects at the QTL, thanks again.