Closed Al-Murphy closed 4 months ago
Apologies I believe I found answers to my two follow-up questions - functional annotations for ~19 million UK Biobank imputed SNPs with MAF>0.1%, based on the baseline-LF 2.2.UKB annotations and UK Biobank LD matrices. But would still appreciate advice on the first question. Thanks!
@Al-Murphy for the purpose of improving fine-mapping accuracy, it's advisable to also use all 187 functional annotations on top of your 13 additional annotations (as this will lead to more informative prior causal effects that take more sources of information into consideration)
That makes a lot of sense, thank you!
@omerwe apologies but one further question - some of my custom annotations are Z scores and so have positive and negative values. I see that in the 187 annotations, the continuous values are 0-1. Will min max scaling be fine to apply to my annotations before adding them? Or would you separate these into two annotations, 1 for positive and 1 for negative since 0, the lack of an annotation would be around .5 with min max scaling?
@Al-Murphy Z-scores are perfectly fine. You might want to normalize them to have variance 1.0, which could improve numerical stability and the behavior of the Ridge regression model.
Thanks @omerwe!
I'm hoping to generate new per-SNP heritabilities (prior distribution of the SNP effect sizes) based on some custom annotations to then be used to fine-map SNPs for a complex trait relating to a specific cell type.
I know to use the polyfun.py to create these along with the annotations.
My question is whether I should use all of the 187 annotations for functional enrichments for a broad set of coding, conserved, regulatory and LD-related annotations from the baseline-LF 2.2.UKB model as used in the paper along with my custom annotations relating to the cell type of interest (13 in total)? Or is it more advisable to use just the custom annotations to generate it?
Secondly, two follow-up questions on this -
1) where can I get the 187 annotations used for the publication? I see
example_data/annotations.CHR.annot.parquet
has a subset of them:2) And very much related but where can I download the LD-score weights for the UK Biobank cohort analysed in the paper? Again a subset seems to be here:
./example_data/weights.CHR.l2.ldscore.parquet
. Apologies if these are very trivial questions.Thanks!