millanek / Dsuite

Fast calculation of Patterson's D (ABBA-BABA) and the f4-ratio statistics across many populations/species
162 stars 25 forks source link

no p-values calculated for any trios after recent update #55

Open jmgorospe opened 2 years ago

jmgorospe commented 2 years ago

Hello,

When using the latest code (the one available on 20.06.2022) for running Dsuite on my dataset it seems that no p-values can be calculated for any of the trios and the analysis can not continue. The code was working at the beginning of this year for the same dataset and I wonder if some of the latest changes in the code are giving this issue, perhaps that of "- Very small p-values now don't get rounded to 0 but are bounded at 2.3e-16".

Here is a copy of the output I get:

There are 51 sets (excluding the Outgroup)
Going to calculate D and f4-ratio values for 20825 trios
Done permutations
The VCF contains 1619 variants
Going to use block size of 79 variants to get 20 Jackknife blocks
Done processing VCF. Preparing output files...

...
p-value could not be calculated for 20825 trios
it looks like there aren't enough ABBA-BABA informative variants for these trios
If this was a run for a subset of the genome (e.g. one chromosome), you may still get p-values for these trios from DtriosCombine

Dsuite/utils/dtools.py:3542: RuntimeWarning: invalid value encountered in double_scalars
  colors = np.concatenate([[[1, 1, 1, 1]], plt.cm.Reds(np.linspace(0., 1 * fmax / max_color_cutoff, 256))])

Using a previous version of the code (previous to to May 2022) worked again so I can run my analysis, but I would like to know if by using the previous code I would be missing some useful new updates. What are exactly the new changes implemented (those from May 7th to May 15) for? Also, would you have any other recommendation on how to solve the issue an be able to use the code with the latest updates? Thank you

Best wishes, Juan Manuel

millanek commented 2 years ago

Hi Juan Manuel

The updates in May were mainly do do with making the code simpler.

There were only two changes in functionality: 1) adding the --KS-test-for-homoplasy option 2) bounding the p-values at 2.3e-16, instead of outputting 0, which was confusing some people.

I tested the code and all seems to work just fine. And no-one else is reporting this error.

But I see you are using the dtools.py script. May the error stems from that? I would be interested in trying to figure it out, if you are willing to share with me the dataset and the command line you exectute to re-produce the error.

Just send me an email

Thanks Milan