Fixed or mixed effects model?

ireneortega commented 1 year ago

I read that LMM is the type of model suitable when working with genomes as the gene or SNP effect is fixed but the population structure effect is random and represents a subpopulation underlying the global population (I am working with 136 genomes). Therefore I applied LMM to my data but when checked the qq-plot I found that p-values are highly inflated which means results cannot be trusted, I think. I created the kinship matrix both with the core genome phylogeny and core SNPs genome, but the qq-plot is always very similar to this one:

similarity_pyseer --vcf core_split.vcf genomes.txt > kinship_matrix.txt

pyseer --lmm --phenotypes traits.csv --pres gene_presence_absence.Rtab --similarity kinship_matrix.txt --output-patterns output_patterns_pvalue.txt --print-filtered --print-samples > OUTPUT.txt

qq_plot

However, when I tried with a fixed effects model, the qq-plot is much better, although I don't know if good enough:

mash sketch -s 10000 -o mash_sketch genomes/*.fasta

mash dist mash_sketch.msh mash_sketch.msh| square_mash > mash.tsv

scree_plot_pyseer mash.tsv

pyseer --phenotypes traits.csv --pres gene_presence_absence.Rtab --distances mash.tsv --save-m mash_mds --lineage --print-samples --max-dimensions 8 --cpu 16 > OUTPUT.txt

qq_plot

I am very confused about the model I need to use and I don't believe I have to choose it depending on how good are the results (or the qq-plot).

Could you please help me in finding the best suitable model or any advice? Thanks!!

johnlees commented 1 year ago

Looks like neither model is working particularly well in this case. I've typically had better luck with controlling inflation of the test statistic with the LMM, but this is dataset dependent.

I think the advice I'd give here is try and confirm results with something else: biological relevance, adding more samples, adding another dataset

ireneortega commented 1 year ago

Ok, I will have a look to what you comment. Thanks!

mgalardini / pyseer

Fixed or mixed effects model? #234