K-mer association with mixed effects model

komaltilwani53 commented 1 month ago

Hi Team,

Appreciate if someone can advise me on this issue: Previous steps:

SNP and COG association with fixed effects model

Pyseer having issue: K-mer association with mixed effects model

Command used: pyseer --lmm --phenotypes phenotypes.txt --kmers fsm_kmers.txt.gz --similarity phylogeny_K.tsv --output-patterns kmer_patterns.txt --cpu 12 > cdi_kmers.txt

Standard output: None

Standard Error file: Read 602 phenotypes Detected binary phenotype Setting up LMM Similarity matrix has dimension (602, 602) Analysing 602 samples found in both phenotype and similarity matrix h^2 = 0.00 No observations of TTTNNNNNN in selected samples No observations of TTTNNNNNNN in selected samples No observations of TTTNNNNNNNN in selected samples No observations of TTTNNNNNNNNN in selected samples No observations of TTTNNNNNNNNNN in selected samples No observations of TTTNNNNNNNNNNN in selected samples No observations of TTTNNNNNNNNNNNN in selected samples

Environment Verified and Test cases executed as shared in tutorial.

Please let me know in case if any further inputs required to investigate this issue

johnlees commented 1 month ago

Sorry I'm not clear from the above what is the issue you are having?

komaltilwani53 commented 3 weeks ago

Hi Team,

Appreciate if someone can advise me on this :

These include the result files; the heritibitly score is o, but the Q Q plot is siginificant. This contradicts itself or shouldn't be taken into account for more analysis. why the heritibility score is 0

SNP and COG association with fixed effects model Read 602 phenotypes Detected binary phenotype Structure matrix has dimension (602, 602) Analysing 602 samples found in both phenotype and structure matrix 4701 loaded variants 2902 pre-filtered variants 1799 tested variants 1799 printed variants
K-mer association with mixed effects model Read 602 phenotypesDetected binary phenotypeSetting up LMMSimilarity matrix has dimension (602, 602) Analysing 602 samples found in both phenotype and similarity matrix h^2 = 0.00 91704889 loaded variants 5125486 pre-filtered variants 86579403 tested variants 86579403 printed variant (Patterns: 83884146 Threshold: 5.96E-10 )

Q-Q plot

mgalardini commented 3 weeks ago

If I understand correctly, you are asking why you are seeing an heritability estimate of 0, but a number of significant unitigs? Depending on the distribution of your phenotype that is entirely possible (also because it's binary, I think). I don't know your dataset and so this is a little more than guessing

komaltilwani53 commented 3 weeks ago

Please find attached the phenotypic file I have been using. I would be grateful if you could review it and let me know what you think. Do I need to take the findings from this file into account for my analysis? heritibility score of 0 have a substantial cause q-q plot is significant

I've been attempting to use Pyseer to create a Manhattan plot from the snp.plot GWAS output. But I haven't been able to locate any noteworthy peaks.

In addition, I would appreciate it if you could provide any techniques or approaches that would enhance my analysis and enable me to get favorable outcomes.

GWAS_Analysis.zip

mgalardini commented 3 weeks ago

It seems to me that the ratio between positive (1) and negative (0) phenotypes is quite unbalanced (~30 / ~600), which might be a problem. also, from the manhattan plot that you sent I don't see any variant passing the 1E-10 threshold, so maybe you could pick a reference in which the threshold passing variants map to?

Other than that I don't have any particular suggestion

johnlees commented 3 weeks ago

I wouldn't read too much into h^2 from pyseer, especially with the phenotype as described, it may be heavily biased. You should use another tool if you want to estimate it more accurately

mgalardini / pyseer

K-mer association with mixed effects model #265