Closed MarioGuCBMR closed 2 years ago
Here is some additional information on how I obtained each loci:
1) I obtained lead independent variants for my main trait triangulating them from other three traits (just some context, but this is not essential). 2) I took LDetect LD regions and, for each lead independent variant, I detected to which LD region it belongs. 3) I obtained as many variants as possible in the LD region for the three traits I used to triangulate my lead independent variants. 4) I calculated the LD matrices for all SNPs available in each LD region using plink1.9 and 1000 Genomes, Phase 3, version 5 reference panel. Originally, I used a very low MAF filter to include as many variants as possible (MAF > 5e-50 included), but I ended up including all variants with MAF > 0.01. Nonetheless, the same issue as above happens.
An important issue is that this has not happened to me with any other co-localization analysis I have done since I have never found so many variants shared by three GWAS summary statistics. However, the ones I am using are huge meta-analysis with great overlap of variants between them.
Finally, here are some additionaly follow-up questions that maybe can help us avoid this issue (considering that it is really caused by the great amount of variants):
1) remove variants with MAF < 5% (after reading coloc tutorial, they say that we should include as many variants as possible, so I dislike this idea) 2) remove variants with P > 0.90 in all traits, or another threshold above 0.50 to remove those variants that are not going to contribute at all to the co-localization.
Let me know your thoughts!!
Hi,
Apologies for the delayed response.
The approach incorporating the LD matrices for trait correlation in overlapping samples is a computationally complex analysis, as you have seen. We recommend using the standard approach even when your traits are correlated in overlapping samples (if your traits are from independent datasets then the standard is the correct approach), as this produced favourable results in simulations even with high trait correlation.
Option 2 above would be a fairly good sensitivity analysis though.
Best wishes,
James
Hi, how is the correlation of the trait (trait.cor) estimated here?
Hi,
We recommend using the standard approach even when your traits are correlated in overlapping samples. However, if you would like to run the model correcting for trait correlation in overlapping samples, trait.cor can be estimated using LD score regression or simply correlating the Z scores between the two GWAS datasets.
Best wishes,
James
Hi, I have been doing some tests with hyprcoloc to co-localize three traits at the same time. Since the traits are correlated with each other, I am using LD and correlations matrices, running hyprcoloc for each loci like this:
hyprcoloc::hyprcoloc(betas, ses, trait.names=trait.names, snp.id=ss$ID, trait.cor = tetrachoric_matrix, ld.matrix = ld_matrix, sample.overlap = sample.overlap, uniform.priors = FALSE)
This seems to work fine, except when the number of SNPs is real high (>8000 in my experience). It outputs this error:
Error in align1(Z, W, 1, trt.clc, trt.no.clc, trait.cor, ld.matrix, epsilon) : Not compatible with requested type: [type=list; target=double]. Calls: looping_hyprcoloc -> -> align.ABF.1 -> align1
I was surprised to seeing this since for all my loci, all formats are the same. Why is the list Vs double issue popping up? I tested whether removing half of the SNPs in the LD-matrices (and their respective betas and ses) and hyprcoloc ran just fine..., so my hypothesis is that when the number of SNPs is real high in each loci, align1 seems to fail for some reason.
Have you experienced this? What can I do to solve this issue?