jianyangqt / gcta

GCTA software
GNU General Public License v3.0
73 stars 23 forks source link

Error: the variance-covariance matrix of bxy is not invertible. #57

Open shreya2031 opened 8 months ago

shreya2031 commented 8 months ago

Hello @anglixue,

I have been trying to run mtCOJO on a trait, adjusting for 3 other traits using GCTA v1.94.1 and this is the error message I received:

Error: the variance-covariance matrix of bxy is not invertible.
An error occurs, please check the options or data

This is the command I used: ./gcta --bfile /home/ssroige/1000G/ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes --mtcojo-file data_list_dd_edu_int_bmi.txt --ref-ld-chr /home/yuh354/gcta/eur_w_ld_chr/ --w-ld-chr /home/yuh354/gcta/eur_w_ld_chr/ --out mtcojo_dd_edu_int_bmi

This is the log file:


Accepted options: --bfile ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes --mtcojo-file data_list_dd_edu_int_bmi.txt --ref-ld-chr /gcta/eur_w_ld_chr/ --w-ld-chr /gcta/eur_w_ld_chr/ --out mtcojo_dd_edu_int_bmi

Reading PLINK FAM file from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.fam]. 2504 individuals to be included from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.fam]. Reading PLINK BIM file from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bim]. 80845844 SNPs to be included from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bim].

Reading GWAS summary data from [data_list_dd_edu_int_bmi.txt] ... 10862058 SNPs in common between the target trait and the covariate trait(s). Filtering out SNPs with multiple alleles or missing value ... 4525 SNPs have missing value or mismatched alleles. These SNPs have been saved in [mtcojo_dd_edu_int_bmi.badsnps]. 10857533 SNPs are retained after filtering. There are 114670 genome-wide significant SNPs with p < 5.0e-08.

Reading PLINK BED file from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bed] in SNP-major format ... Genotype data for 2504 individuals and 114670 SNPs to be included from [ALL.chr1-22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bed]. Calculating allele frequencies ... Checking the difference in allele frequency between the GWAS summary datasets and the LD reference sample... 6771 SNP(s) have large difference of allele frequency between the GWAS summary data and the reference sample. These SNPs have been saved in [mtcojo_dd_edu_int_bmi.freq.badsnps]. Warning: There are 527 SNPs with MAF < 0.01 in the reference sample.

Univariate LD score regression analysis to estimate SNP-based heritability ... DD: 1.019535 0.086080 EA: 1.056403 0.123769 BMI: 1.170255 0.187867 INT: 1.075888 0.178296 Bivariate LD score regression analysis to estimate genetic correlation between each pair of traits ... Intercept: 1.01953 -0.0211231 0.0247448 -0.0083885 -0.0211231 1.0564 -0.105811 0.170605 0.0247448 -0.105811 1.17026 -0.012996 -0.0083885 0.170605 -0.012996 1.07589 rg: 0.0860802 -0.619512 0.305033 -0.406809 -0.619512 0.123769 -0.333927 0.782124 0.305033 -0.333927 0.187867 -0.177266 -0.406809 0.782124 -0.177266 0.178296 The LD score regression analyses completed.

GSMR analysis for covariate #1 (EA) ... 862 index SNPs are obtained from the clumping analysis with p < 5.0e-08 and LD r2 < 0.05. Error: the variance-covariance matrix of bxy is not invertible. An error occurs, please check the options or data

Some things I tried to find the issue:

  1. Run three separate GSMR analysis of EA/BMI/INT against DD to see where the issue was occuring: DD vs INT worked but DD vs EA and DD vs BMI gave the error message
  2. Try the new version of gsmr by adding two flags --gsmr2-beta --heidi-thresh 0.01/0: same error
  3. Use a previous version of GCTA 1.92.4beta2: same error
  4. Try another BMI summary statistics such as "GWAS Anthropometric 2015 BMI Summary Statistics” by Locke et al. Nature. 2015: This worked! The BMI sumstats I have been using initially can be found here: https://zenodo.org/record/1251813#.XCLJ7vZKhE4 (filename: bmi.giant-ukbb.meta-analysis.combined.23May2018.txt.gz).

Could you please help me figure out this issue?

Thanks for your help in advance!

69aru commented 8 months ago

1) Check Data Quality and Consistency:

Ensure that the summary statistics files you are using for EA and BMI are correctly formatted and do not contain any missing or inconsistent data. Confirm that the SNPs in your summary statistics files match the SNPs in the LD reference sample used for GSMR analysis.

2) Filter and Prune Data:

Use the '--extract' option in PLINK to ensure that the SNPs in your GWAS summary statistics data match those in the reference sample. This can help in improving the compatibility of the datasets. Consider removing SNPs with very low minor allele frequencies (MAF) using the '--maf' option to clean the data and reduce noise.

3)Update Reference Data:

Verify that the LD reference sample you are using is appropriate for your dataset and has the necessary quality. You may want to consider using a different LD reference sample or regenerating LD matrices.

4) Increase the Clumping Threshold:

You can try increasing the clumping threshold ('--clump-p1') to a more lenient value, like 1e-5, to include more SNPs in the analysis. This may help you identify a subset of SNPs that do not lead to the invertibility issue.

5) Use Different Summary Statistics:

You mentioned that using different summary statistics for BMI worked. This suggests that the issue might be specific to the summary statistics file you initially used. You might want to stick with the working summary statistics dataset.

6) Check for Outliers and Extreme Values:

Look for any outliers or extreme values in the summary statistics data that could be causing numerical instability.

anglixue commented 8 months ago

Hi Shreya,

I’ve downloaded the data and have tested them as the exposure and used a public sum stats of IBD disease as the outcome.

To my surprise, my test showed that all these four traits have no error in obtaining GSMR results. This may suggest the error is due to the DD GWAS summary itself.

Since you can not share the DD summary, I have the following suggestions for you to diagnose your DD GWAS further.

  1. Make sure you don’t have any duplicated SNPs in DD GWAS
  2. How many SNPs do you have in DD GWAS? My suggestion is to match the SNP panel between exposure and outcome before running GSMR. I noticed that the number of SNPs is very different across those four traits, from 2M to 20M. This could cause problems if the SNP panel is not consistent between exposure and outcome.
  3. Make sure the beta in DD GWAS is in the log(OR) scale
  4. Check if they are publicly available DD GWAS. If yes, download it and have a try.
  5. You can also download any disease GWAS summary and use it as an outcome to double-check that the issue actually comes from the DD GWAS.

Let me know if you find anything and we can further debug this.

Cheers, Angli

shreya2031 commented 7 months ago

Hi Angli,

Thank you for your suggestions! I will try them and get back to you.