Open jaamarks opened 4 years ago
We were a bit concerned that the CFAR subjects were shifted to the left. After double checking everything and reperforming the Structure analysis, it appears that this is correct though. We used the same code to process the COGA subjects and also the WIHS3 subjects and those data appear fine, so it must just be something inherent with the CFAR sample.
The results from three different Structure analyses.
Post-dbGaP | ||
---|---|---|
Post-QC | ||
Imputed SNPs |
We are going to combine the CFAR and COGA genotype data and QC them together. Here are the results from the STRUCTURE analysis thus far.
Action Description | Thresholding Criteria |
---|---|
For EA retainment | (AFR < 25%)∧(EAS < 25%) |
For AA retainment | (AFR > 25%)∧(EAS < 25%) |
For HA retainment | (AFR < 25%)∧(EAS > 25%) |
CFAR subjects are blue.
Included as covariates: age, sex, alc_dep, PC8,PC4,PC3,PC5 (~80%)
Performed with RVTESTS.
Click buttons to expand Manhattan and QQ plots.
RSQ 0.30 | RSQ 0.80 | RSQ 0.90 | |
---|---|---|---|
MAF 1% | :hole: |
:hole: |
:hole: |
MAF 3% | :hole: |
:hole: |
:hole: |
MAF 5% | :hole: |
:hole: |
:hole: |
Removed age outliers from COGA (24 < age < 86).
chrom | name | McLaren_beta | McLaren_P | CFAR_COGA_beta | CFAR_COGA_P |
---|---|---|---|---|---|
6 | rs12210050:475489:C:T | 0.2140599325811672 | 4.847e-09 | -0.157585 | 0.0159497 |
6 | rs41561016:31322611:C:T | -0.41144717978571177 | 9.459e-09 | -0.0396366 | 0.750087 |
6 | rs41557415:31323455:A:G | 0.4123386770513366 | 9.424e-09 | -0.0400005 | 0.74785 |
6 | rs1140487:31322987:C:T | -0.412109650826833 | 9.457e-09 | -0.0400005 | 0.74785 |
6 | rs41543314:31322690:A:G | 0.4028684822608984 | 2.332e-08 | -0.0839558 | 0.507325 |
Verifying the coding for both McLaren and CFAR_COGA.
CFAR_COGA
ID | CHROM | POS | REF | ALT | ALT_EFFSIZE | PVALUE |
---|---|---|---|---|---|---|
rs12210050:475489:C:T | 6 | 475489 | C | T | -0.157585 | 0.0159497 |
rs41561016:31322611:C:T | 6 | 31322611 | C | T | -0.0396366 | 0.750087 |
rs41557415:31323455:A:G | 6 | 31323455 | A | G | -0.0400005 | 0.74785 |
rs1140487:31322987:C:T | 6 | 31322987 | C | T | -0.0400005 | 0.74785 |
rs41543314:31322690:A:G | 6 | 31322690 | A | G | -0.0839558 | 0.507325 |
jmarks@RTI-103356 ~/Projects/hiv/cfar_coga/gwas/0001/maf_both
$ awk '$17 < 1e-15' cfar_coga.eur.1000g_p3.hiv_acq.maf_0.03_both.rsq_0.90.p_lte_0.001.txt
ID | CHROM | POS | REF | ALT | N_INFORMATIVE | AF | INFORMATIVE_ALT_AC | CALL_RATE | HWE_PVALUE | N_REF | N_HET | N_ALT | U_STAT | SQRT_V_STAT | ALT_EFFSIZE | PVALUE |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
rs10911132:182753673:G:A | 1 | 182753673 | G | A | 4763 | 0.0975693:0.141029:0.0547075 | 929.445:667.068:262.377 | 1:1:1 | 0.75388:0.00376253:0.117543 | 3829:1688:2141 | 886:641:245 | 48:36:12 | 125.07 | 11.5818 | 0.932393 | 3.48702e-27 |
rs10911133:182753838:G:T | 1 | 182753838 | G | T | 4763 | 0.102479:0.149187:0.0564141 | 976.216:705.654:270.562 | 1:1:1 | 0.587918:0.00149029:0.117543 | 3816:1675:2141 | 899:654:245 | 48:36:12 | 134.85 | 12.0782 | 0.924377 | 6.0628e-29 |
rs1064257:49993535:C:G | 19 | 49993535 | C | G | 4763 | 0.0874761:0.118845:0.0565384 | 833.297:562.139:271.158 | 1:1:1 | 0.000413457:0.00120084:0.00595059 | 3963:1833:2130 | 783:516:267 | 17:16:1 | 109.352 | 10.96 | 0.910348 | 1.91459e-23 |
SNP | REF(0) | ALT(1) | ALT_Frq | MAF | AvgCall | Rsq | Genotyped | LooRsq | EmpR | EmpRsq | Dose0 | Dose1 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1:182753838:G:T | G | T | 0.10209 | 0.10209 | 0.99400 | 0.94751 | Genotyped | 0.540 | 0.637 | 0.40571 | 0.45810 | 0.03929 |
1:182753673:G:A | G | A | 0.09718 | 0.09718 | 0.99015 | 0.91021 | Imputed | - | - | - | - | - |
19:49993535:C:G | C | G | 0.08712 | 0.08712 | 0.99541 | 0.95675 | Genotyped | 0.760 | 0.654 | 0.42775 | 0.50306 | 0.02054 |
While the LooRsq statistic completely ignores experimental genotypes, EmpR is calculated by calculating the correlation between the true genotyped values and the imputed dosages that were calculated by hiding all known genotyped for the given SNP (see LooDosage). A negative correlation between imputed and experimental genotypes can indicate allele flips. This statistic also can only be provided for genotyped sites. EmpRsq is the square of this correlation.
See parent GitHub Issue
133
.CFAR dbGaP COGA dbGaP
QC these dbGaP studies and combine for GWAS and eventual inclusion in the HIV acquisition meta-analysis.
Age Distributions
## Age distributions for CFAR and COGA | CFAR | COGA | |----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| | ![image](https://user-images.githubusercontent.com/32715488/66494481-bd5c3f00-ea85-11e9-9eb1-1b16e2e6831d.png) | ![image](https://user-images.githubusercontent.com/32715488/66494554-d5cc5980-ea85-11e9-9672-27232ad33a0d.png) |CFAR dbGaP Ternary Plot
![image](https://user-images.githubusercontent.com/32715488/77101636-01b51900-69ee-11ea-9f8a-4360b0e9e890.png)COGA dbGaP Ternary Plot
![image](https://user-images.githubusercontent.com/32715488/77101519-d6322e80-69ed-11ea-9458-0da7f2b96b04.png)