tobyjohnson / gtx

Genetics ToolboX R package
29 stars 9 forks source link

Dose missingness #1

Open andrew-slater opened 9 years ago

andrew-slater commented 9 years ago

What is the rationale behind the termination of blockassoc if any of the dose values are missing?

~line 92 of blockassoc.R

stopifnot(!any(is.na(dose)))

If we want to remove this restriction to enable analysis of assayed genotypes (converted to dose values) where a no-call is possible, what do we need to consider?

tobyjohnson commented 9 years ago

The reason for this stopifnot() is that if any subjects are missing, we can’t reuse the null model fit (m0) to compare against the model fit including genotype (m1) to calculate the likelihood ratio test statistic. Both null and alternative models have to be fitted to exactly the same set of subjects. It was my intention to add code to compute a subset-specific null model fit whenever there is missing genotype data. The stopifnot() is just there as an interim measure, to stop the code producing incorrect results, until that functionality is there.

Programming the subset-specific null model fit may require a bit of care (at least, I was concerned enough to defer implementing it until I had enough time to think through the issues carefully). There may be a few statistical edge case issues (like what happens if there is 100% missing data, or so much missing data that the null model cannot be fitted) and possibly a programming issue (how to update the subjects included in the fit, when the argument passed into blockassoc() is the quoted function call for performing the model fit). We want to program robustly enough that the whole pipeline doesn't fall over if un-QCed genotype data are fed in.

I'm happy to work on a fix for this.