Closed reedus-123 closed 3 years ago
Hi, could you maybe clarify your question a bit more? The phenotype variants are associated to is the one you provide through the --phenotypes
and --phenotype-column
command line arguments. The latter is only required if your phenotype is not encoded in the last column. Hope this helps.
Thank you for your explanation. What I would like to know is if you have two phenotypes and you find that there is a significant association between variant and phenotype, is there a way to find out which of the two phenotypes the variant is positively associated with? (i.e., found more frequently)
So for instance, if I was comparing two sets of samples (i.e from cattle and from soil) and I found that there was a number of significant variants, how would I know if the presence of these variants is significantly associated with cattle or soil? Also, does the software support the comparison between 3 groups? (i.e., cattle, soil and water)
If you want to do associations against a discrete phenotype with multiple classes (> 2), than I'd suggest using something called dummy encoding, or more specifically one-hot encoding. This way you will have 3 phenotypes in your last example (e.g. cattle
vs. the rest, soil
vs. the rest, and water
vs. the rest), and you can run three separate associations. Does that make sense?
Yes, it does. So in the phenotypes.pheno file, the structure would be sample phenotype a.fasta 1 b.fasta 2 c.fasta 3
where 1, 2 and 3 are cattle, soil and water respectively.
But my main question is - once I've run the pipeline and concluded that there is a significant association between a variant and phenotype, what steps can I take to figure out which of the three phenotypes the variant is associated with? i.e., is it associated with cattle, soil or water?
Thanks again for answering my questions, I really appreciate it.
Hi,
what I meant is something like this:
sample cattle soil water
a.fasta 1 0 0
b.fasta 0 1 0
c.fasta 0 0 1
Oh, perfect, thank you. So if I got this right, the variants identified as significant would be associated with the samples marked as a '1' and not a '0' in their respective column? Is that correct?
Yes, that is correct
On Sat, Nov 14, 2020, 23:10 reedus-123 notifications@github.com wrote:
Oh, perfect, thank you. So if I got this right, the variants identified as significant would be associated with the samples marked as a '1' and not a '0' in their respective column? Is that correct?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/mgalardini/pyseer/issues/127#issuecomment-727272084, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAISWXZLVL6X4BSMBHQ2VGDSP353TANCNFSM4TQBQBHA .
Sorry about all the questions, thank you for all your help!
Hello,
I would like to ask a follow-up question; After running a GWAS (--lmm mode, --pres), I see that a lot of significant genes are homogenously distributed among the 0 and 1 phenotypes or even skewed towards 0. This is difficult to interpret given that the significant genes should be associated with the 1 phenotype. Does it makes sense to keep only those with β>0?
Thanks a lot!
This depends on your phenotype, but generally I would say you should keep them all!
Thanks for the advice! The phenotype is commensal (0) pathogenic (1).
From: John Lees @.> Sent: Tuesday, May 24, 2022 6:56:11 PM To: mgalardini/pyseer @.> Cc: iaposto @.>; Comment @.> Subject: Re: [mgalardini/pyseer] Question: how to identify which phenotype a variant/kmer is associated with? (#127)
This depends on your phenotype, but generally I would say you should keep them all!
— Reply to this email directly, view it on GitHubhttps://github.com/mgalardini/pyseer/issues/127#issuecomment-1136107101, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AHDKLGVFNZHE55WLRVNEXKDVLT3ZXANCNFSM4TQBQBHA. You are receiving this because you commented.Message ID: @.***>
This isn't an issue but rather a query (apologies if it's in the wrong place).
Having followed the GWAS pipeline, once significant variants/kmers are identified how can one go about identifying which phenotype it's positively associated with?
Thanks in advance, and thanks for creating this software.