Closed cizydorczyk closed 2 years ago
To answer your second question, you can definitely use covariates with the elastic net model.
Regarding the first one, I am a bit confused by your statement "it seems that presence/absence of a variant in an isolate will not correspond to presence/absence in a patient". Could you please clarify this a bit?
One thing you could try is to encode the presence of multiple isolates in a patient as a binary variable in your covariates matrix, though I may have misunderstood your specific dataset.
Thank you for your quick response!
What I meant by my statement is that if in patient A we have isolates 1-10, and (in truth) isolates 1-7 have a causal variant and 8-10 do not, and the phenotype I am working with is derived from the patient and not individual isolates (i.e. patient from whom isolates were obtained does/does not have disease), then would it not pose a problem that I am assigning the same phenotype to these 10 isolates, despite some having the causal variant and others not?
Perhaps I am the one who is confused here. Admittedly, I am not entirely certain what effect multiple isolates/patient has on GWAS, other than potentially introducing further population structure.
Thank you, Conrad
So if I understood correctly you have genome sequences of the individual isolates and a phenotype that is per-patient, so you cannot attach a "true" phenotype to each isolate. I agree that this makes the analysis tricky, as in principle each sample can be assigned more than one phenotype, assuming you observe certain isolates in multiple patients that have different phenotypes.
One thing you may wish to try is modelling the patient identifier as a random effect (especially if number of covariates for patients is larger). We don't support this in pyseer, but you can use a linear mixed model package such as lme4
to make these models (with some care to model the genetic relatedness matrix in the same way as in pyseer/limix), or I think you could do it in a general Bayesian inference package such as stan
.
Closing for lack of follow-up discussion
Hello,
I have a scenario where I have multiple isolates/patient and was wondering how best to incorporate this into a GWAS analysis. Intuitively, it seems that presence/absence of a variant in an isolate will not correspond to presence/absence in a patient, yet my phenotype is defined by patient (i.e. patient does/does not have disease). I came across a recent microbial GWAS review (San et al. 2020 https://doi.org/10.3389/fmicb.2019.03119) that suggests including such "intra-patient diversity" as covariates in an analysis, and recommends PySEER as one option that allows covariates.
Would such an approach make sense, or does it grossly violate assumptions of a GWAS? (broadly speaking)
Second, is it possible to include a covariates file when using the elastic net model? I cannot find anywhere in the documentation that specifically states whether this is/is not possible. I only found reference to the lineage clusters option
--lineage-clusters
, but I do not think this is what I am looking for.Any help in understanding is greatly appreciated.
Thank you, Conrad