Closed jdblischak closed 1 year ago
@jdblischak is there any update on this? Can we close the issue? Or I'm happy to discuss further if there any updates. (For future reference: it looks like the annotation coefficients are not predictive of chi-square statistics in this data for some reason...)
Unfortunately I got sidetracked by other more pressing priorities. We can close this here, since I am apparently the only one with this issue. If I get time in the future, I'll follow up on our email thread. Thanks as always for all your assistance and advice!
Overview
We have processed the GWAS summary statistics for many dozens of traits using a pipeline that performs functionally-informed fine-mapping with PolyFun. Overall this has been very successful (thanks for creating and maintaining such great open source code!). However, a few traits have failed, and when this happens, it is typically caused by a
true_divide
error in Step 4 to re-estimate per-SNP heritabilities via S-LDSC (--compute-h2-bins
), i.e.I haven't been able to figure out the exact source of the issue. When I use
pdb.set_trace()
to interactively debug, I can confirm that the data at that point is problematic (lots of zeros!). But from following the traceback, I haven't been able to figure out the upstream cause of the problem. I've also searched the input files for potentially problematic entries (NA
,NaN
,Inf
,-Inf
), but I haven't found any. Making it worse it that thetrue_divide
error gets triggered in different parts of the code base, so it may not even be one specific problem but multiple related problems.Some more context on our pipeline. We use PolyFun approach 3 to compute prior causal probabilities non-parametrically. We use the baseline UKBB annotations you provide plus some custom annotations. Furthermore, we use DENTIST to remove any SNPs with large LD mismatches between the summary statistics and the UKBB LD reference panel (as we discussed in #115). Based on some experiments I have performed, it appears that the SNPs removed by DENTIST are potentially causing the problem (more details below in the section with the reproducible example).
Tracebacks
I can't share all of the data, so here are the two tracebacks that have failed with
true_divide
errors at step 4. I've seen each of these trackbacks twice (ie 2 different traits each have failed with the 2 errors below):Reproducible example
As I hinted to above, I think that the SNPs removed by DENTIST may be causing the problem. For one of our traits, DENTIST only removed a small percentage of SNPs, but this was sufficient to trigger the true_divide error. Unfortunately I am unable to share that data set with you.
However, I was able to put together a reproducible example using the UC data from De Lange et al. 2017. For simplicity, I only use the baseline annotations. The PolyFun steps complete successfully for the full summary statistics, but fail with a true_divide error (the first one listed in the section above) for the DENTIST-filtered summary statistics.
I'm going to email you with the two summary statistics file, a script to run the PolyFun steps, and a conda lock file so that you can recreate the exact same conda environment that I used. Any advice you can provide would be greatly appreciated!