Closed ireneortega closed 1 year ago
Looks like neither model is working particularly well in this case. I've typically had better luck with controlling inflation of the test statistic with the LMM, but this is dataset dependent.
I think the advice I'd give here is try and confirm results with something else: biological relevance, adding more samples, adding another dataset
Ok, I will have a look to what you comment. Thanks!
I read that LMM is the type of model suitable when working with genomes as the gene or SNP effect is fixed but the population structure effect is random and represents a subpopulation underlying the global population (I am working with 136 genomes). Therefore I applied LMM to my data but when checked the qq-plot I found that p-values are highly inflated which means results cannot be trusted, I think. I created the kinship matrix both with the core genome phylogeny and core SNPs genome, but the qq-plot is always very similar to this one:
similarity_pyseer --vcf core_split.vcf genomes.txt > kinship_matrix.txt
pyseer --lmm --phenotypes traits.csv --pres gene_presence_absence.Rtab --similarity kinship_matrix.txt --output-patterns output_patterns_pvalue.txt --print-filtered --print-samples > OUTPUT.txt
However, when I tried with a fixed effects model, the qq-plot is much better, although I don't know if good enough:
mash sketch -s 10000 -o mash_sketch genomes/*.fasta
mash dist mash_sketch.msh mash_sketch.msh| square_mash > mash.tsv
scree_plot_pyseer mash.tsv
pyseer --phenotypes traits.csv --pres gene_presence_absence.Rtab --distances mash.tsv --save-m mash_mds --lineage --print-samples --max-dimensions 8 --cpu 16 > OUTPUT.txt
I am very confused about the model I need to use and I don't believe I have to choose it depending on how good are the results (or the qq-plot).
Could you please help me in finding the best suitable model or any advice? Thanks!!