Open Samriddhi0906 opened 8 months ago
That comes a bit of a surprise, and this is not what we see in our unit tests, which return the same results every time. One thing I can think of is some stochasticity introduced when using multiple cores. Do you see the same variability when using a single core?
As an aside, a p-value threshold of 0.05 is likely too high, please refer to the docs for suggestions about setting such threshold.
Thanks for your response. I did run it three times with 1 CPU and I still get variable results. wicovariates_cpu1_1.tsv: 6268 wicovariates_cpu1_2.tsv: 6345 wicovariates_cpu1_3.tsv: 6357
As for the p-value threshold, this is just for filtering and comparison to see whether I am getting variable results between runs. For my analysis, I correct it for multiple testing before taking any further steps.
Hi, just checking in to see if you are still observing this behavior? I might be able to run some tests if so
Hi,
Yes, I just ran it again and got similar results. If I do not use covariates, the number of significant variants is consistent, but when I do, they vary between runs. Also, the number of significant variants found is higher when using covariates than without.
File wicovariates1.txt: 1487 significant variants File wicovariates2.txt: 1401 significant variants File wicovariates3.txt: 1401 significant variants File wicovariates4.txt: 1468 significant variants File wicovariates5.txt: 1495 significant variants File wocovariates1.txt: 1148 significant variants File wocovariates2.txt: 1148 significant variants File wocovariates3.txt: 1148 significant variants
My script:
When using covariates:
pyseer --phenotypes phenotype.tsv \ --pres gene_presence_absence.Rtab \ --similarity phylogeny_similarity.tsv \ --lmm \ --covariates covariates.tsv \ --use-covariates 2 \ --cpu 1 > $1
Without covariates:
pyseer --phenotypes phenotype.tsv \ --pres gene_presence_absence.Rtab \ --similarity phylogeny_similarity.tsv \ --lmm \ --cpu 8 > $1
Could you please share your inputs please? This way I'm sure to be able to reproduce this behavior
Sure!
covariates.tsv https://drive.google.com/file/d/1QWxjEMPi7miMkinCX0mo9X2gmpyCyWal/view?usp=drive_web gene_presence_absence.Rtab https://drive.google.com/file/d/1DPcipJXydIfYLHVv0VNesW0sh2Vpa6wk/view?usp=drive_web phenotype_health_1.tsv https://drive.google.com/file/d/1gqz4MUD1gXNUwN-Ds6ZDjdS_J_2-uf44/view?usp=drive_web phylogeny_similarity.tsv https://drive.google.com/file/d/10ysZUFq1e8pNR3ALB-8901q5mTGF2O3S/view?usp=drive_web
On Thu, 18 Jul 2024 at 14:47, Marco Galardini @.***> wrote:
Could you please share your inputs please? This way I'm sure to be able to reproduce this behavior
— Reply to this email directly, view it on GitHub https://github.com/mgalardini/pyseer/issues/264#issuecomment-2236410926, or unsubscribe https://github.com/notifications/unsubscribe-auth/A7AHEG7AU66S4SH5YZ3VDB3ZM62OHAVCNFSM6AAAAABLADYKG2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZWGQYTAOJSGY . You are receiving this because you authored the thread.Message ID: @.***>
After running Pyseer using
pyseer --phenotypes phenotypes.tsv --pres gene_presence_absence.Rtab --similarity phylogeny_similarity.tsv --lmm --covariates covariates.tsv --use-covariates 2 --cpu 8 > $1
and then filtering for significant genes using lrt-pvalue < 0.05 the number of significant genes varies between pyseer runs even though none of the input files have any changes.
In total 7 runs with covariates were run. Within these the lowest number of significant genes is 1245, the highest is 1395. Also, each run has a different number of significant genes.
The expectation would be that each run has the same number of significant genes. When filtering for filter-pvalue <0.05 the number of significant genes is constant.
Additionally, the number of significant genes after using covariates is about twice the number of significant genes without covariates (based on lrt-pvalue, however, they are the same when filtering using filter-pvalue).
Could you help me understand whether this behaviour is expected when running pyseer? Thanks in advance.