Open shreya2031 opened 8 months ago
1) Check Data Quality and Consistency:
Ensure that the summary statistics files you are using for EA and BMI are correctly formatted and do not contain any missing or inconsistent data. Confirm that the SNPs in your summary statistics files match the SNPs in the LD reference sample used for GSMR analysis.
2) Filter and Prune Data:
Use the '--extract' option in PLINK to ensure that the SNPs in your GWAS summary statistics data match those in the reference sample. This can help in improving the compatibility of the datasets. Consider removing SNPs with very low minor allele frequencies (MAF) using the '--maf' option to clean the data and reduce noise.
3)Update Reference Data:
Verify that the LD reference sample you are using is appropriate for your dataset and has the necessary quality. You may want to consider using a different LD reference sample or regenerating LD matrices.
4) Increase the Clumping Threshold:
You can try increasing the clumping threshold ('--clump-p1') to a more lenient value, like 1e-5, to include more SNPs in the analysis. This may help you identify a subset of SNPs that do not lead to the invertibility issue.
5) Use Different Summary Statistics:
You mentioned that using different summary statistics for BMI worked. This suggests that the issue might be specific to the summary statistics file you initially used. You might want to stick with the working summary statistics dataset.
6) Check for Outliers and Extreme Values:
Look for any outliers or extreme values in the summary statistics data that could be causing numerical instability.
Hi Shreya,
I’ve downloaded the data and have tested them as the exposure and used a public sum stats of IBD disease as the outcome.
To my surprise, my test showed that all these four traits have no error in obtaining GSMR results. This may suggest the error is due to the DD GWAS summary itself.
Since you can not share the DD summary, I have the following suggestions for you to diagnose your DD GWAS further.
Let me know if you find anything and we can further debug this.
Cheers, Angli
Hi Angli,
Thank you for your suggestions! I will try them and get back to you.
Hello @anglixue,
I have been trying to run mtCOJO on a trait, adjusting for 3 other traits using GCTA v1.94.1 and this is the error message I received:
This is the command I used:
./gcta --bfile /home/ssroige/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes --mtcojo-file data_list_dd_edu_int_bmi.txt --ref-ld-chr /home/yuh354/gcta/eur_w_ld_chr/ --w-ld-chr /home/yuh354/gcta/eur_w_ld_chr/ --out mtcojo_dd_edu_int_bmi
This is the log file:
Analysis started at 14:01:21 PDT on Fri Sep 22 2023. Hostname: tscc-10-7.sdsc.edu
Accepted options: --bfile ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes --mtcojo-file data_list_dd_edu_int_bmi.txt --ref-ld-chr /gcta/eur_w_ld_chr/ --w-ld-chr /gcta/eur_w_ld_chr/ --out mtcojo_dd_edu_int_bmi
Reading PLINK FAM file from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.fam]. 2504 individuals to be included from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.fam]. Reading PLINK BIM file from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bim]. 80845844 SNPs to be included from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bim].
Reading GWAS summary data from [data_list_dd_edu_int_bmi.txt] ... 10862058 SNPs in common between the target trait and the covariate trait(s). Filtering out SNPs with multiple alleles or missing value ... 4525 SNPs have missing value or mismatched alleles. These SNPs have been saved in [mtcojo_dd_edu_int_bmi.badsnps]. 10857533 SNPs are retained after filtering. There are 114670 genome-wide significant SNPs with p < 5.0e-08.
Reading PLINK BED file from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bed] in SNP-major format ... Genotype data for 2504 individuals and 114670 SNPs to be included from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bed]. Calculating allele frequencies ... Checking the difference in allele frequency between the GWAS summary datasets and the LD reference sample... 6771 SNP(s) have large difference of allele frequency between the GWAS summary data and the reference sample. These SNPs have been saved in [mtcojo_dd_edu_int_bmi.freq.badsnps]. Warning: There are 527 SNPs with MAF < 0.01 in the reference sample.
Univariate LD score regression analysis to estimate SNP-based heritability ... DD: 1.019535 0.086080 EA: 1.056403 0.123769 BMI: 1.170255 0.187867 INT: 1.075888 0.178296 Bivariate LD score regression analysis to estimate genetic correlation between each pair of traits ... Intercept: 1.01953 -0.0211231 0.0247448 -0.0083885 -0.0211231 1.0564 -0.105811 0.170605 0.0247448 -0.105811 1.17026 -0.012996 -0.0083885 0.170605 -0.012996 1.07589 rg: 0.0860802 -0.619512 0.305033 -0.406809 -0.619512 0.123769 -0.333927 0.782124 0.305033 -0.333927 0.187867 -0.177266 -0.406809 0.782124 -0.177266 0.178296 The LD score regression analyses completed.
GSMR analysis for covariate #1 (EA) ... 862 index SNPs are obtained from the clumping analysis with p < 5.0e-08 and LD r2 < 0.05. Error: the variance-covariance matrix of bxy is not invertible. An error occurs, please check the options or data
Some things I tried to find the issue:
Could you please help me figure out this issue?
Thanks for your help in advance!