Closed martinastoycheva closed 2 years ago
This looks pretty typical for a bacterial GWAS to me, I can't see anywhere you're going wrong. Sometimes/often the population structure/phenotype stratification in the samples is impossible to completely 'remove'. It can be useful to draw the phenotype on the tree to have a look at this. But the standard significance threshold approach might not work too well with these results, and I would generally advise seeing this as a ranking of most likely associations.
It's best not to prune, as you will lose signal that way (without substantively changing the statistical test).
I would use exactly the same approach for genes and unitigs.
Thanks very much for the quick response! If the p values are a bit unrealible in my case, would you advice to try the whole genome lasso model instead and not use p values?
You can certainly try this as an alternative, though it is more focused on prediction, rather than finding causal variants.
Hello,
I am having problems with controlling for the population stuctre in my sample set and I was wodnering if I can get some help? I am not sure when the QQplots are good enough in order to be able to trust the p values.
I have ran pyseer with gene presence file and a whole genome vcf. I am using a very good quality hgt free phylogeny and I have used the following command:
I was wondering if I need to do any LD pruning prior to runing pyseer? I am asking because I noticed that if I do some LD pruning with bcftools (e.g r2=0.8, window size=1000) I can get the QQplots to lool very good:
as opposed to bad before pruning:
However, this approach removes most of the SNPs and makes the results very hard to interpret, and I am overall not very confident it is very good. Therefore, I was wondering if there is a standard way to overcome this issue?
I am also wondering how this could be extended to unitigs and accessory genes? As my gene presence absence QQplots don't look that great either:
Cheers, Martina