simonhmartin / genomics_general

General tools for genomic analyses.
341 stars 93 forks source link

p_value in ABBABABAwindows.py #73

Open XiaXiaTianTian opened 2 years ago

XiaXiaTianTian commented 2 years ago

Hi,Simon.

Since ABBABABAwindows.py does not caculate p_value of statistics. Are there some possible parameters in genomics_general doing this? Or would you give us some suggestions that help do FDR correction of caculated D, fd, fdM?

Thanks a lot.

simonhmartin commented 2 years ago

Hi, There is no way with this script to compute a p-value, and in my opinion there is no legitimate way to compute a non-parametric p-value for any of these statistics in sliding windows. The reason is that we will always expect some heterogeneity along the genome, but there is no way to know the expected distribution of values of fd for example without knowing the underlying demographic history. If you want to identify significant outliers and minimize false discovery, my recommendation would be to simulate data under a realistic scenario with neutral introgression (for example using msprime, which can export vcf files). For this you would need to first estimate the evolutionary scenario, which you could perhaps infer using a tool like moments. This would then allow a parametric test for introgression outliers beyond what is expected under neutrality. Best wishes, Simon

XiaXiaTianTian commented 2 years ago

So nice of you, Simon. Thanks for your prompt reading and constructive suggestions. Best regards.