zhilizheng / SBayesRC

GNU General Public License v3.0
25 stars 5 forks source link

Lack of overtransmission #9

Closed NiranOkewole closed 1 year ago

NiranOkewole commented 1 year ago

Hi @zhilizheng So I did sensitivity analyses on my polygenic scores with pTDT and a couple of key ones do not show overtransmission among probands. Any idea why this could happen? Thanks.

zhilizheng commented 1 year ago

Hi @NiranOkewole ,

Thanks for the report. I have some difficulty to understand. Could you explain the issue in more details? How you run the analysis, e.g., which LD used (from parent?), summary data (from parent?), obtain PRS, and what to be expected.

Thanks.

Regards, Zhili

NiranOkewole commented 1 year ago

I am working with a North American dataset, with sumstats from a Scandinavian dataset.

I used LD and annotations as prepared and recommended by SBayesRC for European ancestry: ld_folder="/ukbEUR_Imputed"
annot="/annot_baseline2.2.txt"

After generating the scores, I performed pTDT on trios (first all trios and then singletons), to obtain mean difference in scores, SD and p value. Of 10 polygenic scores, there was overtransmission in 6 but not in the two core phenotypes, which was surprising.

Here are (slightly redacted) logs from my steps running SBayesRC:

cat _tidy.ma.log

Tidy the summary data Summary data: sumstats_COJO LD path: ukbEUR_Imputed Output: ./_tidy.ma Freq_thresh: 0.2 N_range: 3 rate2pq: 0.5 7356518 SNPs in LD information 7357731 SNPs in summary data 7357731 valid SNPs in summary data 6975128 SNPs in common with LD information Consistent SNP IDs 6975128 SNPs have consistent alleles (A1, A2) between the summary data and LD 6975122 SNPs passed the allele frequency checking with threshold 0.2 Mean N = 26202.46, SD = 0 6975122 SNPs have sample size within mean +- 3SD After QC, Median sample size: 26202.46 Quantile of rate2pq: 0% 25% 50% 75% 100% 0.8821773 0.9683722 1.0000000 1.0258029 1.2112146 Var_y: 2.37989051561602 6975122 SNPs remained after QC by rate2pq 0.5 Done

head _imp.ma.log

Impute the summary data by LD 6975122 SNPs in common between GWAS summary and LD 0 SNPs set from summary data 6975122 SNPs flipped alleles 6975122 SNPs are typed SNPs Start summary imputation... ==========1========= Imputing... Prepare and reading time: 0.0244401 LD reconstruct with m = 7346, k = 2073

tail _imp.ma.log

LD reconstruct with m = 9864, k = 2331 Numer of SNP typed: 8202, Number of SNP to impute: 1662 Reconstruct LD time: 2.93628 Get LDtt time: 0.0965755 Get LDit time: 0.0271936 LDi_Z time: 1.7421 imp_z time: 0.00435808 Done. user system elapsed 8102.005 304.673 8536.014

cat _sbrc.txt

GWAS summary statistics: ./_imp.ma Annotation: TRUE Annotation file: annot_baseline2.2.txt LD input folder: ukbEUR_Imputed, eigen variance cutoff threshold: 0.995 Tune the best eigen cutoff: TRUE Tuning step: 0.995 0.9 0.8 0.7 0.6 Tuning iterations: 150 Tuning burn-in iterations: 100 Start hsq: 0.5 Start Pi: 0.99 0.005 0.003 0.001 0.001 Gamma: 0 0.001 0.01 0.1 1 Number of MCMC iteration: 3000 Burn-in iteration: 1000 Use 2pq to scale summary: FALSE Method to resample residual: allMixVe Threshold to resample residual: 1.1 Hostname: cpu-p-595 Analysis started: 2023-06-06 09:48:38

0 SNPs flipped allele Observed phenotypic variance: 2.37989051561602 Generating pseudo-summary...


Perform tuning...

Tune with parameter 0.995 Running SBayesRC without annotation Number of variants from LD information: 7356518 Start reading LD information, and cutting the variance to 0.995... Found LD in ukbEUR_Imputed/block{BLOCK}.eigen.bin, type: 2 Finish reading 591 LD blocks Var_y: 1 Mean vare: 1 Number of distributions: 5 Start MCMC with 150 iterations... Iter 100, nnz=163120, sigmaSq=20.851, hsq=20399.703, ssq=389329.781, n2=162253, n3=0, n4=1, n5=866, vg2=0.060, vg3=0.000, vg4=0.000, vg5=0.939, vare=8922.554, time = 983.064

Tune with parameter 0.9 Running SBayesRC without annotation Number of variants from LD information: 7356518 Start reading LD information, and cutting the variance to 0.9... Found LD in ukbEUR_Imputed/block{BLOCK}.eigen.bin, type: 2 Finish reading 591 LD blocks Var_y: 1 Mean vare: 1 Number of distributions: 5 Start MCMC with 150 iterations... Iter 100, nnz=181251, sigmaSq=21.816, hsq=4373.618, ssq=7478.946, n2=181167, n3=2, n4=71, n5=11, vg2=0.940, vg3=0.000, vg4=0.036, vg5=0.022, vare=1476.474, time = 280.879

Tune with parameter 0.8 Running SBayesRC without annotation Number of variants from LD information: 7356518 Start reading LD information, and cutting the variance to 0.8... Found LD in ukbEUR_Imputed/block{BLOCK}.eigen.bin, type: 2 Finish reading 591 LD blocks Var_y: 1 Mean vare: 1 Number of distributions: 5 Start MCMC with 150 iterations... Iter 100, nnz=631347, sigmaSq=10.869, hsq=6854.206, ssq=7786.720, n2=631346, n3=1, n4=0, n5=0, vg2=0.999, vg3=0.000, vg4=0.000, vg5=0.000, vare=0.049, time = 210.132

Tune with parameter 0.7 Running SBayesRC without annotation Number of variants from LD information: 7356518 Start reading LD information, and cutting the variance to 0.7... Found LD in ukbEUR_Imputed/block{BLOCK}.eigen.bin, type: 2 Finish reading 591 LD blocks Var_y: 1 Mean vare: 1 Number of distributions: 5 Start MCMC with 150 iterations... Iter 100, nnz=343487, sigmaSq=11.533, hsq=3982.324, ssq=4456.249, n2=343487, n3=0, n4=0, n5=0, vg2=1.000, vg3=0.000, vg4=0.000, vg5=0.000, vare=0.203, time = 189.119

Tune with parameter 0.6 Running SBayesRC without annotation Number of variants from LD information: 7356518 Start reading LD information, and cutting the variance to 0.6... Found LD in ukbEUR_Imputed/block{BLOCK}.eigen.bin, type: 2 Finish reading 591 LD blocks Var_y: 1 Mean vare: 1 Number of distributions: 5 Start MCMC with 150 iterations... Iter 100, nnz=223589, sigmaSq=11.369, hsq=2549.725, ssq=2795.229, n2=223589, n3=0, n4=0, n5=0, vg2=1.000, vg3=0.000, vg4=0.000, vg5=0.000, vare=0.420, time = 182.117 Tuning Done Continue with best eigen variance cutoff: 0.9 Time elapsed: 0.91 hour(s)


Start SBayesRC with eigen variance cutoff 0.9 Running SBayesRC with annotation Total annotation category (including intercept): 97. Number of binary annotation: 84, quantitative annotation: 13. Number of variants from LD information: 7356518 Start reading LD information, and cutting the variance to 0.9... Found LD in ukbEUR_Imputed/block{BLOCK}.eigen.bin, type: 2 Finish reading 591 LD blocks Var_y: 1 Mean vare: 1 Number of distributions: 5 Start MCMC with 3000 iterations...

skipping to the end

MCMC cycles completed. Use the ./_sbrc.txt, column 1 2 3 to calculate the polygenic risk score.

Parameter estimation: hsq nnz sigmaSq ssq vare varg 4900.00531 165216.95500 24.80624 9415.16191 1545.04121 4900.00531 Analysis finished: 2023-06-06 19:55:50 Computational time: 36431.893 seconds

cat _.log

PLINK v1.90b6.21 64-bit (19 Oct 2020) Options in effect: --bfile --out --score ./testsbrc.txt 1 2 3 header sum center --threads 1 Random number seed: 1686150981 191855 MB RAM detected; reserving 95927 MB for main workspace. 9698250 variants loaded from .bim file. 47170 people (0 males, 0 females, 47170 ambiguous) loaded from .fam. Ambiguous sex IDs written to .nosex . Using 1 thread. Before main variant filters, 47170 founders and 0 nonfounders present. Calculating allele frequencies... done. Total genotyping rate is 0.997053. 9698250 variants and 47170 people pass filters and QC. Note: No phenotypes present. Warning: 19998 lines skipped in --score file (19990 due to variant ID mismatch, 8 due to allele code mismatch); see .nopred for details. --score: 7336520 valid predictors loaded. --score: Results written to .profile .

zhilizheng commented 1 year ago

Hi @NiranOkewole ,

Thanks for the details. That's much clear to me.

I have a suggestion on the LD reference. From the log, it looks like the eigen cutoff value is very small, it suggests the population of the summary data is different from the UKB EUR. In this case, the model could be mis-specified, hence the prediction is sub-optimal (although SBayesRC usually gives better results compared to other methods in this case which will be shown in our revision soon).

I suggest to obtain a similar genotype data from your summary and re-generate the LD (see the document section "Generate LD") if this is possible.

Hope this is useful.

Regards, Zhili

NiranOkewole commented 1 year ago

Ah I see. Will explore that. Thanks so much.