jaamarks / jaamarks_notebooks

Collection of various projects and procedures documented with Jupyter notebooks.
0 stars 0 forks source link

NGC FOU LDSC Regression #10

Open jaamarks opened 4 years ago

jaamarks commented 4 years ago

Description: The frequency of use (FOU) phenotype for NGC continues to be flipped in the opposite direction from the other NGC phenotypes (OAall and OAexp). See figures in GitHub issue 140 in the December 16, 2019 comment. We are going to investigate this issue by performing an LDSC regression on a few different combinations of phenotypes.


Date Locations: NGC summary stats results location:

Note: The effect allele for the FOU analyses is the same as for the other NGC metas (A2). The effect allele for the OAall GWAS analysis is ALT.

See also https://s3.console.aws.amazon.com/s3/buckets/rti-heroin/ldsc/data/

jaamarks commented 4 years ago

001

OAall subset (UHS1 GWAS results) vs FOU subset (only UHS sample) Note that FOU with old UHS4 failed to complete.

oaall_vs_fou ld_regression_results-1

002

FOU subset (UHS samples only with the newest UHS4) vs OAall all samples and OAall subset (deCODE contributing to OAexp)

fou_vs_oaall ld_regression_results-1

003

Whole FOU vs OAall subset of UHS1 GWAS results

fou087_vs_oaall_gwas002 ld_regression_results-1

jaamarks commented 4 years ago

FOU vs OAall Contributing Samples

fou_vs_oaall ld_regression_results-1


Note that COGA, Yale-Penn-CIDR, Yale-Penn-GO, and deCODE_OAexp did not finish due to errors during the correlation/heritability analyses. See logs below fou_087_by_coga.ldsc_regression.log fou_087_by_decode_oaexp.ldsc_regression.log fou_087_by_yalepenn_cidr.ldsc_regression.log fou_087_by_yalepenn_go.ldsc_regression.log

jaamarks commented 4 years ago

006

Remove VIDUS from both FOU and OAall

reference trait: FOU_097 (N=5,088)  (no VIDUS)
compared to: OAall_098 (N=302,330)  (OAall no VIDUS)
             OAall_099 (N=29,443)   (OAall no VIDUS and deCODE OAexp subset)

fou_vs_oaall ld_regression_results-1


007 VIDUS: FOU vs OAall

VIDUS FOU (N=300) vs VIDUS OAall (N=2177)

vidus_test ld_regression_results-1

jaamarks commented 4 years ago

008

I tried to run FOU UHS1–4 (with old UHS4) compared to OAall UHS1, but again it failed with the same error. See below for N information.

Heritability of phenotype 1
---------------------------
Total Observed scale h2: -0.0466 (0.1796)
Lambda GC: 1.0345
Mean Chi^2: 1.0281
Intercept: 1.0304 (0.0062)
Ratio: 1.0828 (0.2207)

FOU_old_uhs4 -- UHS1(897) + UHS2-3(772) + UHS4(861) = 2,530 OAall -- UHS1(9,245)

jaamarks commented 4 years ago

009

    ○ 009: FOU_092 (without UHS4) vs FOU_087 and Oaall_UHS1 GWAS results and COGA

COGA continuing to failed

test009 ld_regression_results-1

jaamarks commented 4 years ago

011–013 (CATS vs CATS)

011: CATS_FOU (N=1226) vs CATSPERTHUNT_OAall (N=1920) & CATSMOLE_OAall (N=3162) 012: CATSMOLE_OAall (N=3162) vs CATS_FOU (N=1226) & CATSPERTHUNT_OAall (N=1920) 013: CATSPERTHUNT_OAall (N=1920) vs CATS_FOU (N=1226) & CATSMOLE_OAall (N=3162)


011 012 013
catsfou_catsoaall ld_regression_results-1 catsoaall_catsfou ld_regression_results-1 catsoaall_catsfou2 ld_regression_results-1


We see from the attached spreadsheet that order does not matter in the LDSC regression pipeline, as expected. In particular:

The correlation between traits stays the same even when the reference trait and non-reference trait are switched. In particular:

20200211_cats_fou_vs_oaall.xlsx

cats_fou_by_catsmole.ldsc_regression.log catsmole_oaall_by_cats_fou.ldsc_regression.log catsperthunt_by_cats_fou.ldsc_regression.log

jaamarks commented 4 years ago

014 (Yale-Penn vs Yale-Penn)

Yale-Penn FOU (N=850) vs Yale-Penn-CIDR (N=666) & Yale-Penn-GO (N=917)

Failed to run.

Heritability of phenotype 1 (FOU reference)
---------------------------
Total Observed scale h2: -0.934 (0.4784)
Lambda GC: 1.0046
Mean Chi^2: 1.0054
Intercept: 1.0207 (0.006)
Ratio: 3.821 (1.103)

Heritability of phenotype 2/3 (YP-GO)
-----------------------------
Total Observed scale h2: 0.0442 (0.5186)
Lambda GC: 1.0315
Mean Chi^2: 1.0323
Intercept: 1.0316 (0.0068)
Ratio: 0.9764 (0.2091)

Heritability of phenotype 3/3 (YP-CIDR)
-----------------------------
Total Observed scale h2: -0.398 (0.6178)
Lambda GC: 1.0135
Mean Chi^2: 1.0078
Intercept: 1.0129 (0.0061)
Ratio: 1.654 (0.7842)
jaamarks commented 4 years ago

015 (fou087 meta vs OAall cohorts)

Complete FOU_087(N=5,388) compared to OAall GWAS results:

fou_meta_vs_oaall_gwas ld_regression_results-1


Heritability of phenotype COGA
-----------------------------
Total Observed scale h2: -0.2964 (0.0533)
Lambda GC: 0.9898
Mean Chi^2: 0.9887
Intercept: 1.0337 (0.0064)
Ratio: NA (mean chi^2 < 1)

deCODE_oaexp
FloatingPointError: invalid value encountered in sqrt

Heritability of phenotype yale-penn_cidr
-----------------------------
Total Observed scale h2: -0.4177 (0.6015)
Lambda GC: 1.0135
Mean Chi^2: 1.0072
Intercept: 1.0126 (0.0059)
Ratio: 1.7479 (0.8206)

Yale-Penn_GO
FloatingPointError: invalid value encountered in sqrt
jaamarks commented 4 years ago

OAall PCA of Top SNPs

For this analysis we compared the OAall cohorts by running a PCA of the union set of top SNPs from each cohort. In particular, we extracted the top SNPs (with P<10E-4 ) from each OAall cohort and then took the union set of those SNPs with which we performed a PCA. Note that we removed any SNPs from the union set that were not present in all cohorts. The OAall cohorts were:

image

deCODE OAall and deCODE OAexp were the exact same in this model. This is because the MAFs were the exact same for both phenotypes, despite the difference in sample size. This should be noted because it will affect which list of SNPs to filter out due to the MAF threshold. These data were retrieved from:

jaamarks commented 4 years ago

LDSC errors computing rg

Raymond Walters described here that lack of polygenic signal can be the culprit for floating point errors.

that error indicates that there are negative estimates of heritability observed for one of the traits when computing the jackknife standard errors, which normally this indicates that at least one of the input results has very low heritability

For comparison, LDHub restrict to looking at genetic correlation in GWAS with heritability Z-score (h2g / SE) above 2, and the original Nature Genetics paper restricted to Z-score > 4, compared to .0746/.0656 = 1.14 for your results.

Also see this GitHub issue

This generally means the heritability of at least one of the traits is quite small and/or unstable (e.g. due to small sample size

it’s possible that there simply isn’t enough signal in your data for a reliable genetic correlation analysis with ldsc (which would be consistent with the error message you are seeing).

We normally recommend against LDSR genetic correlation analyses when the univariate h2 results for either phenotype isn't clearly significant (e.g. z-score > 4).

Univariate Estimation of Heritability (--h2)

COGA Total Observed scale h2: -0.2965 (0.0534) Mean Chi^2: 0.9887

deCODE_OAexp Total Observed scale h2: 0.0385 (0.1841) Mean Chi^2: 0.9875

Yale-Penn CIDR Total Observed scale h2: -0.4239 (0.6054) Mean Chi^2: 1.0072

Yale-Penn GO
Total Observed scale h2: 0.1106 (0.5307) Mean Chi^2: 1.0319




Q. Why is my total h2 estimate negative?

A. Negative h2 is of course not meaningful, but negative h2 estimates can occur. This usually means that the true h2 is close to zero, and sampling error pushed the estimate below zero.

Check whether mean chi2 is above ~1.02. If no, this means there is very little polygenic signal for LDSC to work with.

jaamarks commented 4 years ago

017

FOU meta106 (N=4,592) vs OAall meta105 (N=18,825) Meta-analyses excluding the cohorts which ran LMM.

fou_vs_oaall ld_regression_results-1


Heritability of phenotype 1
---------------------------
Total Observed scale h2: 0.0875 (0.1109)
Mean Chi^2: 1.0263

Heritability of phenotype 2/2
-----------------------------
Total Observed scale h2: 0.1451 (0.0221)
Mean Chi^2: 1.0299

Note 0.0875/0.1109 = 0.735 which is indicates that the heritability estimate Z-score is too low.

LDHub restrict to looking at genetic correlation in GWAS with heritability Z-score (h2g / SE) above 2, and the original Nature Genetics paper restricted to Z-score > 4

As a rule of thumb, LD Score regression tends to yield very noisy results when applied to datasets with fewer than ~5k samples

jaamarks commented 4 years ago

018

Same as 017 except compare to individual level OAall GWAS results.

OAall cohorts * CATS-MOLE (N=3,162): s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz * CATS-PERHUNT (N=1,920): s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz * Kreek (N=556): s3://rti-midas-data/studies/ngc/kreek/association_tests/003/ea/oaall/kreek.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_kreek.rsq_gt_0.3.gz * Bulgaria (N=2,765): s3://rti-heroin/gwas/op_dep_bulgaria/results/oaall/0001/mis_minimac4_eagle2.4/1000g_p3/eur/chr{1..22}.1000g_ids.maf_gt_0.01_eur_obd.rsq_gt_0.3.gz * UHS1 (N=9,245): s3://rti-midas-data/studies/ngc/uhs1/association_tests/002/ea/oaall/uhs1.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz * VIDUS (N=2,177): s3://rti-midas-data/studies/ngc/vidus/association_tests/002/ea/oaall/vidus.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz


fou_vs_oaall ld_regression_results-1

jaamarks commented 4 years ago

019 (VIDUS_FOU vs OAall cohorts)

VIDUS_FOU (N=300) vs OAall_meta individual cohorts.

reference ``` VIDUS (N=300) s3://rti-heroin/rti-midas-data/studies/ngc/vidus/association_tests/001/ea/fou/vidus.ea.fou.chr{1..22}.1000g_ids.maf_gt_0.03_eur_vidus.se.rsq_gt_0.3.gz ```
comparison ``` Bulgaria (N=2,765) s3://rti-heroin/gwas/op_dep_bulgaria/results/oaall/0001/mis_minimac4_eagle2.4/1000g_p3/eur/chr{1..22}.1000g_ids.maf_gt_0.01_eur_obd.rsq_gt_0.3.gz CATS-MOLE (N=3,162) s3://rti-midas-data/studies/ngc/cats/association_tests/010/ea/oaall/catsmole.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz CATS-PERHUNT (N=1,920) s3://rti-midas-data/studies/ngc/cats/association_tests/011/ea/oaall/catsperthunt.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_cats.rsq_gt_0.3.gz * COGA (N=7,631) s3://rti-midas-data/studies/ngc/coga/association_tests/001/ea/oaall/coga.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_coga.se.rsq_gt_0.3.gz deCODE OAall (N=275,468) s3://rti-midas-data/studies/ngc/decode/association_tests/001/ea/oaall/decode.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz deCODE OAexp (N=2,581) s3://rti-midas-data/studies/ngc/decode/association_tests/002/ea/oaexp/decode.ea.oaexp.chr{1..22}.1000g_ids.maf_gt_0.01_eur_decode.beta_se.rsq_gt_0.3.gz Kreek (N=556) s3://rti-midas-data/studies/ngc/kreek/association_tests/003/ea/oaall/kreek.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_kreek.rsq_gt_0.3.gz UHS1 (N=9,245) s3://rti-midas-data/studies/ngc/uhs1/association_tests/002/ea/oaall/uhs1.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_uhs1.rsq_gt_0.3.gz VIDUS (N=2,177) s3://rti-midas-data/studies/ngc/vidus/association_tests/002/ea/oaall/vidus.ea.oaall.chr{1..22}.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz * Yale-Penn-CIDR (N=666) s3://rti-midas-data/studies/ngc/yale-penn/association_tests/004/ea/oaall/yale-penn.cidr.ea.chr{1..22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz * Yale-Penn-GO (N=917) s3://rti-midas-data/studies/ngc/yale-penn/association_tests/005/ea/oaall/yale-penn.go.ea.chr{1..22}.1000g_ids.maf_gt_0.01_eur_yale-penn.rsq_gt_0.3.gz ``` \* Cohorts that had errors during LDSR.



20200220_vidus_fou_vs_oaall ld_regression_results-1





020 (VIDUS_OAall vs FOU cohorts)

VIDUS_OAall meta (N=2177) vs FOU GWAS of individual cohorts

reference ``` VIDUS_OAall (N=2177) s3://rti-heroin/rti-midas-data/studies/ngc/vidus/association_tests/002/ea/oaall/vidus.ea.oaall.chr1.1000g_ids.maf_gt_0.01_eur_vidus.se.rsq_gt_0.3.gz ```
comparison ``` * alive (N=152) s3://rti-heroin/ldsc/data/opioid_fou/alive_fou001_n152.txt.gz cats (N=1226) s3://rti-heroin/ldsc/data/opioid_fou/cats_fou001_n1226.txt.gz * cogend (N=99) s3://rti-heroin/ldsc/data/opioid_fou/cogend_fou001_n99.txt.gz * start (N=231) s3://rti-heroin/ldsc/data/opioid_fou/start_fou001_n231.txt.gz * uhs1 (N=897) s3://rti-heroin/ldsc/data/opioid_fou/uhs1_fou001_n897.txt.gz uhs2-3 (N=772) s3://rti-heroin/ldsc/data/opioid_fou/uhs2_3_fou001_n772.txt.gz * uhs4 (N=1067) s3://rti-heroin/ldsc/data/opioid_fou/uhs4_fou0001_n1067.txt.gz * vidus (N=300) s3://rti-heroin/ldsc/data/opioid_fou/vidus_fou0001_n300.txt.gz * yale-penn (N=850) s3://rti-heroin/ldsc/data/opioid_fou/yalepenn_fou001_n850.txt.gz ``` \* Cohorts that had errors during LDSR.


vidus_oaall_vs_all_fou ld_regression_results-1

jaamarks commented 4 years ago

FOU cohorts: Z-score of heritability estimate

Expand Table | cohort | N | h2 | se | zscore | |--------|------|----------|----------|--------| | alive | 152 | -1.2532 | (2.6704) | | | cats | 1226 | 1.8512 | (0.4473) | 4.13 | | cogend | 99 | -6.3417 | (4.0912) | | | start | 231 | 0.0173 | (1.7918) | 0.0096 | | uhs1 | 897 | 0.6334 | (0.491) | 1.289 | | uhs2-3 | 772 | 0.03 | (0.6457) | 0.0464 | | uhs4 | 1067 | 0.2717 | (0.386) | 0.703 | | * vidus | 300 | -1.3093 | (1.4063) | | | yp | 850 | -0.8698 | (0.4161) | | \* Using the GWAS results at: `s3://rti-shared/gwas/vidus/results/box_cox_useropioid6mfq/0001/mis_minimac3_shapeit2/1000g_p3/eur` The VIDUS results at: `s3://rti-heroin/rti-midas-data/studies/ngc/vidus/association_tests/001/ea/fou` had a Z-score of: Total Observed scale h2: 1.4573 (1.3586) = 1.0725


OAall cohorts: Z-score of heritability estimate

Expand Table | cohort | N | h2 | se | zscore | |---------------|--------|----------|----------|---------| | Bulgaria | 2765 | 0.2539 | (0.1536) | 1.6529 | | CATS-MOLE | 3162 | 0.3934 | (0.1543) | 2.5495 | | CATS-PERTHUNT | 1920 | 0.7583 | (0.2346) | 3.2323 | | COGA | 7631 | -0.2965 | (0.0534) | | | deCODE OAall | 275468 | 0.0068 | (0.0017) | 4 | | deCODE OAexp | 2581 | 0.0385 | (0.1841) | 0.20912 | | Kreek | 556 | 2.8865 | (0.9397) | 3.07172 | | UHS1 | 9245 | 0.1642 | (0.0478) | 3.43514 | | VIDUS | 2177 | 0.4593 | (0.2154) | 2.1323 | | YP-CIDR | 666 | -0.4239 | (0.6054) | | | YP-GO | 917 | 0.1106 | (0.5307) | 0.2084 |


jaamarks commented 4 years ago

Next steps

We next plan to combine the UHS2–3 & UHS4 data and perform an FOU GWAS. We will then perform a heritability estimate (hg2 ) on those results using the LDSR tool, and also using GREML (as implemented in GCTA). One caveat is that GREML requires the raw genotype data of each individual—so we can't use summary stats.


image

jaamarks commented 4 years ago

GWAS Plots: UHS2-4

EUR

Genotype PC Plots ![image](https://user-images.githubusercontent.com/32715488/76426936-148e7480-6382-11ea-8344-c31be2ea8bab.png) ![image](https://user-images.githubusercontent.com/32715488/76426958-1bb58280-6382-11ea-92b5-6ac1a0a3fff3.png) ``` MODEL FORMULA: fou~PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 ================ EUR group ================ Top PCs: PC4 PC8 PC3 PVE: 76.21 ``` ![image](https://user-images.githubusercontent.com/32715488/76427193-633c0e80-6382-11ea-8c6c-ada1d3a19ea3.png) ![image](https://user-images.githubusercontent.com/32715488/76427498-c168f180-6382-11ea-9613-ec102eee664f.png)
GWAS Plots ![uhs234 eur 1000G fou assoc plot snps+indels manhattan](https://user-images.githubusercontent.com/32715488/76428125-a480ee00-6383-11ea-809e-01b9e6e30029.png) ![uhs234 eur 1000G fou assoc plot snps+indels qq](https://user-images.githubusercontent.com/32715488/76428127-a5198480-6383-11ea-985d-b9e93900db9b.png)

### AFR
Genotype PC Plots ![image](https://user-images.githubusercontent.com/32715488/76426967-1f490980-6382-11ea-8a81-f55c40f31085.png) ![image](https://user-images.githubusercontent.com/32715488/76426985-22dc9080-6382-11ea-8404-aa440e289076.png) ``` ================ AFR group ================ Top PCs: PC7 PC6 PC8 PVE: 81.26 ``` ![image](https://user-images.githubusercontent.com/32715488/76427440-abf3c780-6382-11ea-8e8e-0787baccd347.png) ![image](https://user-images.githubusercontent.com/32715488/76427472-b8782000-6382-11ea-83c5-42d65e857dce.png)
GWAS Plots ![uhs234 afr 1000G fou assoc plot snps+indels manhattan](https://user-images.githubusercontent.com/32715488/76428167-b19ddd00-6383-11ea-8ea2-c9fa602c3d9a.png) ![uhs234 afr 1000G fou assoc plot snps+indels qq](https://user-images.githubusercontent.com/32715488/76428169-b2367380-6383-11ea-832a-0896b495583d.png)
jaamarks commented 4 years ago

021 (VIDUS OAall vs UHS2-4 FOU)

VIDUS OAall (N=2177) UHS2-4 (N=1562)

UHS2-4 had a negative heritability, even though when ran separately they did not. You see that UHS2-3 had very low h2 with large se and same for UHS4.

cohort N h2 se zscore
uhs2-3 772 0.03 (0.6457) 0.0464
uhs4 1067 0.2717 (0.386) 0.703
uhs2-4 log file ``` 1170377 SNPs with valid alleles. Heritability of phenotype 1 --------------------------- Total Observed scale h2: 0.4834 (0.2124) Lambda GC: 1.1144 Mean Chi^2: 1.1106 Intercept: 1.09 (0.0069) Ratio: 0.8142 (0.0624) Heritability of phenotype 2/2 ----------------------------- Total Observed scale h2: -0.1794 (0.2583) Lambda GC: 1.0016 Mean Chi^2: 0.9948 Intercept: 1.0003 (0.0058) Ratio: NA (mean chi^2 < 1) Genetic Covariance ------------------ Total Observed scale gencov: -0.3198 (0.1595) Mean z1*z2: -0.0148 Intercept: -0.0032 (0.0041) ```