Open Samriddhi0906 opened 4 months ago
That comes a bit of a surprise, and this is not what we see in our unit tests, which return the same results every time. One thing I can think of is some stochasticity introduced when using multiple cores. Do you see the same variability when using a single core?
As an aside, a p-value threshold of 0.05 is likely too high, please refer to the docs for suggestions about setting such threshold.
Thanks for your response. I did run it three times with 1 CPU and I still get variable results. wicovariates_cpu1_1.tsv: 6268 wicovariates_cpu1_2.tsv: 6345 wicovariates_cpu1_3.tsv: 6357
As for the p-value threshold, this is just for filtering and comparison to see whether I am getting variable results between runs. For my analysis, I correct it for multiple testing before taking any further steps.
After running Pyseer using
pyseer --phenotypes phenotypes.tsv --pres gene_presence_absence.Rtab --similarity phylogeny_similarity.tsv --lmm --covariates covariates.tsv --use-covariates 2 --cpu 8 > $1
and then filtering for significant genes using lrt-pvalue < 0.05 the number of significant genes varies between pyseer runs even though none of the input files have any changes.
In total 7 runs with covariates were run. Within these the lowest number of significant genes is 1245, the highest is 1395. Also, each run has a different number of significant genes.
The expectation would be that each run has the same number of significant genes. When filtering for filter-pvalue <0.05 the number of significant genes is constant.
Additionally, the number of significant genes after using covariates is about twice the number of significant genes without covariates (based on lrt-pvalue, however, they are the same when filtering using filter-pvalue).
Could you help me understand whether this behaviour is expected when running pyseer? Thanks in advance.