jrs95 / hyprcoloc

Hypothesis Prioritisation in multi-trait Colocalization
https://jrs95.github.io/hyprcoloc/
GNU General Public License v3.0
46 stars 12 forks source link

Can the correlation matrix use genotypic (LDSC) correlations instead of phenotypic ones? #3

Closed ngbowker closed 4 years ago

ngbowker commented 4 years ago

I am interested in applying this method of colocalisation to my research. To this end, I have a question surrounding the correlation matrix, no mention is made of whether this should be a phenotypic correlation or a genotypic correlation matrix. Assumedly this was designed with the intention of it being a phenotypic correlation, however, I am interested in analysing a continuous trait together with a number of binary traits and so a phenotypic correlation may not be particularly useful. I was wondering if a genotypic correlation from LDSC may be more applicable? As genotypic correlations are typically similar to the phenotypic correlation and may be more applicable as Hyprcoloc is using genetic information for its inferences. The genetic correlation would of course be genome-wide instead of restricted to the region of interest but I was wondering what the implications of using a genotypic correlation vs a phenotypic correlation might be in terms of adjusting the analysis?

jrs95 commented 4 years ago

Hi,

Apologies for the delayed response - I have recently moved jobs and my github account was still sending notifications to my old University of Bristol address.

The method would work with both. We have only really trialed the method with genotype correlation, because of the reasons you give above, through correlating genome-wide Z-scores from GWAS using the tetrachoric correlation approach. This approach is very similar to LDSC. The nice advantage of using these types of approaches is that they account for sample overlap between the phenotypes as well.

Having said this, in simulations where we induced phenotype correlation caused by sample overlap, the standard model mostly out-performed the model which tries to account for the correlation! So, our advice is to still use the standard model, as the correlation model is much more difficult to fit and also requires LD information to adjust the priors in the presence of a trait correlation matrix (there is a complicated argument as to why this is necessary, but @cnfoley should be able to help if you want more information).

Hope this helps.

Best wishes,

James

ngbowker commented 4 years ago

Hi James,

Thanks for getting back to me – I understand that moving jobs comes with a lot of things to sort out! It’s good to hear that this might be a possibility, especially since it will account for the sample overlap as well. I thought of perhaps using the LDSC approach as I am using the R package and, as I understand it, the tetrachoric method isn’t included. Having spoken to Chris Foley recently I have decided to stick with the standard model instead of accounting for correlation.

Thanks for the help,

Best wishes,

Nick

Nicholas Bowker | Aetiology of Diabetes & Related Metabolic Disorders Group MRC Epidemiology Unit, University of Cambridge Institute of Metabolic Science Box 285, Addenbrooke’s Hospital, Hills Road, Cambridge CB2 0QQ T: +44 (0) 1223 746873 | E: nick.bowker@mrc-epid.cam.ac.ukmailto:nick.bowker@mrc-epid.cam.ac.uk@mrc-epid.cam.ac.uk

From: James Staley [mailto:notifications@github.com] Sent: 10 October 2019 10:35 To: jrs95/hyprcoloc Cc: Nick Bowker; Author Subject: Re: [jrs95/hyprcoloc] Can the correlation matrix use genotypic (LDSC) correlations instead of phenotypic ones? (#3)

Hi,

Apologies for the delayed response - I have recently moved jobs and my github account was still sending notifications to my old University of Bristol address.

The method would work with both. We have only really trialed the method with genotype correlation, because of the reasons you give above, through correlating genome-wide Z-scores from GWAS using the tetrachoric correlation approach. This approach is very similar to LDSC. The nice advantage of using these types of approaches is that they account for sample overlap between the phenotypes as well.

Having said this, in simulations where we phenotype correlation caused by sample overlap, the standard model often out-performed the model which tries to account for the correlation. So, our advice is to still use the standard model, as the correlation model is much more difficult to fit and also requires LD information to adjust the priors in the presence of a trait correlation matrix (there is a complicated argument as to why this is necessary, but @cnfoleyhttps://github.com/cnfoley should be able to help if you wnat more information).

Hope this helps.

Best wishes,

James

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/jrs95/hyprcoloc/issues/3?email_source=notifications&email_token=ALWAWWVCNNOZYDSNTWUSIFTQN3ZLXA5CNFSM4IYM2VL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEA3STDQ#issuecomment-540486030, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALWAWWS3D2ZHMVETOBCGDSDQN3ZLXANCNFSM4IYM2VLQ.

jrs95 commented 4 years ago

Hi Nick,

The tetrachoric method is very simple. You just correlate genome-wide indicator variables for the phenotypes where 0 equals the Z-score for that SNP is negative and 1 if it is positive. The reason this is not included in the package is because we would have to devise a mechanism for users to import full genome-wide results, and as we believe the standard model still performs best in this scenario we didn't think this was necessary.

Best wishes,

James