CS with individual level data

CharliHarlow commented 3 years ago

Hi, I have been trying to run susie with individual level data from UKB. I convert the genotypes into dosages and then create a genotype matrix. I also perform the covariates adjustment as suggested before to create the final input files for susie. Each time susie is run however I am getting very large cred sets which have over 11,000 variants in them. I have been running susie with the default settings and have adjusted some of the options in order to see if any of these would give more meaningful cred sets. I wonder if you could offer any advice or help with this? Here is an examples of the cred sets I am getting:

Command to run fitted_adjusted <- susie(raw2, pheno_susie[,1], L = 10, estimate_residual_variance = TRUE, estimate_prior_variance = FALSE, scaled_prior_variance = 0.1, min_abs_corr = 0.0, verbose = TRUE)

Cred set output

$purity min.abs.corr mean.abs.corr median.abs.corr L9 1.939965e-05 0.05201907 0.01831098 L7 1.774827e-05 0.05630721 0.01949726 L6 1.054947e-05 0.05143106 0.01714920 L3 1.034395e-05 0.05039611 0.01706981 L5 8.795920e-06 0.06292139 0.02208106 L2 6.810453e-06 0.05276931 0.01733549 L8 3.465249e-06 0.05655636 0.01968297 L1 9.722116e-07 0.04467518 0.01365854

$cs_index [1] 9 7 6 3 5 2 8 1

$coverage [1] 0.9500093 0.9500169 0.9500354 0.9500451 0.9500114 0.9500218 0.9500402 0.9500175

$requested_coverage [1] 0.95

gaow commented 3 years ago

@CharliStoneman To check on some basics, is there a reason you set scaled prior variance to 0.1? This, interpreted as the percentage of variance explained per SNP in this context, is very high for a GWAS study. What if you let SuSiE estimate it and not specify it?

CharliHarlow commented 3 years ago

Hi @gaow no there was not a reason i set it to that, I was just using the settings applied in the vignette here: https://stephenslab.github.io/susieR/ I can give it a go rerunning with no scaled prior variance set.

stephens999 commented 3 years ago

@gaow maybe we should remove the fixed prior variance from the vignette since we would encourage estimating it.

On Wed, Jun 30, 2021 at 8:54 AM Charli Stoneman @.***> wrote:

Hi @gaow https://github.com/gaow no there was not a reason i set it to that, I was just using the settings applied in the vignette here: https://stephenslab.github.io/susieR/ I can give it a go rerunning with no scaled prior variance set.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/131#issuecomment-871426006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRI7RZPK5OKXGJ2M6TLTVMO2RANCNFSM47SCZTEQ .

pcarbo commented 3 years ago

Note that if the number of individuals (rows) in your data matrix is much larger than the number of SNPs, it might be faster to use susie_suff_stat.

CharliHarlow commented 3 years ago

I have re-run without the scaled_prior_variance option but set estimate_prior_variance = TRUE and get a resulting plot like the following We also now get credsets=NULL

stephens999 commented 3 years ago

so it looks like there are no mappable signals in the region. What are the single-snp z scores? any significant?

On Wed, Jun 30, 2021 at 12:26 PM Charli Stoneman @.***> wrote:

I have re-run without the scaled_prior_variance option but set estimate_prior_variance = TRUE and get a resulting plot like the following We also now get credsets=NULL [image: image] https://user-images.githubusercontent.com/38356770/124005157-aa636e00-d9d0-11eb-8863-33992dc0cfd9.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/stephenslab/susieR/issues/131#issuecomment-871593533, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANXRRJICOD5JMSA7KLMEC3TVNHUTANCNFSM47SCZTEQ .

gaow commented 3 years ago

@stephens999 yes I just update the vignette. We did discuss choice of priors in the vignette but the ordering of the narrative is not very good now that I read it again. It is now improved along with the R code updates.

CharliHarlow commented 3 years ago

Yes several are significant. When running for the summ stats for the same trait we get the following PIP plot

gaow commented 3 years ago

When running for the summ stats for the same trait we get the following PIP plot

@CharliStoneman are you referring to running susie_rss? What's your input for R -- how did you obtain that? The LD structure of this region looks complicated (the SNPs in purple are spread out yet very highly correlated)

CharliHarlow commented 3 years ago

We used susie_suff_stat for fine-mapping using the summary stats and the input for susie was the summary stats where we got the betas & se from and also calculated the z-scores. We also input an ld matrix for the region we were looking at. For the individual level data, we generated a genotype matrix for SNPs within the region of interest, the phenotype values and covariates which we regressed out of both the genotype and the phenotype to generate the input for susie.

stephens999 commented 3 years ago

@CharliStoneman i think you will need to share code and data to get further feedback. The results from full data and summary data usually agree closely when the LD matrix is computed from the in-sample genotypes, so something is probably wrong with the pipeline if you get different results.

gaow commented 2 years ago

Close this ticket due to a lack of follow up. Discussions related to complications with using external LD reference are

135
122

stephenslab / susieR

CS with individual level data #131

135

122