jianyangqt / gcta

GCTA software
GNU General Public License v3.0
73 stars 23 forks source link

Error: the V matrix is not invertible. #72

Open AmandaHWChong opened 4 months ago

AmandaHWChong commented 4 months ago

Hi,

I have been running fastGWA-mlm and have been getting the error 'the V matrix is not invertible.' Prior to running the GWAS I have rank inverse normal transformed by continuous variables and then adjusted by age and sex. I have only added PCs as my quantitative covariate to the GWAS analysis. Could you please advise me on how to troubleshoot this issue.

Thank you!

Output below:

Options:

--bfile chr1_22_filtered --grm-sparse sp-grm --fastGWA-mlm --pheno ldlc_rint_adjusted.txt --qcovar pc.txt --thread-num 10 --out ldlc_assoc

The program will be running with up to 10 threads. Reading PLINK FAM file from [chr1_22_filtered.fam]... 140831 individuals to be included from FAM file. Reading phenotype data from [ldlc_rint_adjusted.txt]... 34168 overlapping individuals with non-missing data to be included from the phenotype file. 34168 individuals to be included. 0 males, 0 females, 34168 unknown. Reading PLINK BIM file from [chr1_22_filtered.bim]... 7941687 SNPs to be included from BIM file(s). Reading quantitative covariates from [pc.txt]. 7 covariates of 140831 samples to be included. 34168 overlapping individuals with non-missing data to be included from the covariate file(s). Reading the sparse GRM file from [sp_grm]... After matching all the files, 34168 individuals to be included in the analysis. Estimating the genetic variance (Vg) by fastGWA-REML (grid search)... Error: the V matrix is not invertible. An error occurs, please check the options or data

longmanz commented 4 months ago

Hi, Could be related to the sparse GRM. Could you tell me how many rows are there in your .grm.sp and your .grm.id file? What criteria did you use to generate the GRM and sparse GRM?

AmandaHWChong commented 4 months ago

Hi,

There are 348,924 rows in my .grm.sp and 140,831 rows in my .grm.id file.

To generate my sparse GRM I using the KING software to get relationships to the third degree which generated a .kin0 file. I then used this pedfam.R script (https://github.com/MRCIEU/Lifecourse-GWAS/blob/main/resources/genotypes/pedFAM.R) to generate my sparse grm.

longmanz commented 4 months ago

Hi, Thank you. Do you expect to see that many related pairs in your dataset? The number of related pairs = (348,924 - 140,831), which means you have a lot of individuals share >= 3rd degree relatedness with each other.

AmandaHWChong commented 4 months ago

Hi,

Yes in this cohort we would expect to see a high amount of relatedness. Would this be the issue that is causing the error?

longmanz commented 4 months ago

Hi, No that is probably not causing the issue. I was just checking as we do not often see such high relatedness. I think this might be driven by the pedFAM.R script you used. That script was written a long time ago and was not extensively tested. I would recommend the following checks:

  1. Try a different phenotype, and see if the same issue occurs. You can actually use a list randomly generated numbers as the phenotype. If we still see the same issue, then we can confirm this is driven by the sparse GRM, not the phenotype.

  2. Try to generate the sparse GRM using genotypes (the standard strategy in fastGWA). Given the fact that your dataset has very high relatedness, using only up to 3rd degree relatives might not be sufficient to capture all the relatedness in your dataset. You might consider generating a sparse GRM from the snp genotypes as we recommended (https://yanglab.westlake.edu.cn/software/gcta/index.html#MakingaGRM). We provide the "--make-grm-part" module to shorten the runtime.

AmandaHWChong commented 4 months ago

Thank you for your suggestion!

I have made the sparse GRM using the standard strategy in fastGWA that you recommended using: --bfile $dir/chr1_22_filtered \ --make-grm \ --sparse-cutoff 0.05 \ --thread-num 10 \ --out $dir/sp_grm

However when running fastGWA-lmm again I get this error 'Error: not enough valid null SNPs (<100). You may check if too variants are removed by a filter, e.g., MAF.' Would this be something to do with the --sparse-cutoff flag I've used?

Thank you for your help!

longmanz commented 4 months ago

Hi, Could you tell me how many rows do you have in your sp_grm.grm.sp and sp_grm.grm.id files this time? Also could you show me the log file you obtained when running the fastGWA-lmm? You can either post it here or send it to me via email (ljiang@nygenome.org).

AmandaHWChong commented 4 months ago

Hi,

Yes the sp_grm.grm.sp file has 92,644,627 rows and the sp_grm.grm.id file has 140,831 rows. For the genotype data I have filtered by MAF > 0.01 and INFO > 0.8.

Also, here is my log file when running fastGWA-lmm:

--bfile chr1_22_filtered --grm-sparse sp_grm --fastGWA-mlm --pheno ldlc_rint_adjusted_age_sex.txt --qcovar pc.txt --thread-num 10 --out ldlc_assoc

The program will be running with up to 10 threads. Reading PLINK FAM file from [chr1_22_filtered.fam]... 140831 individuals to be included from FAM file. Reading phenotype data from [ldlc_rint_adjusted_age_sex.txt]... 34168 overlapping individuals with non-missing data to be included from the phenotype file. 34168 individuals to be included. 0 males, 0 females, 34168 unknown. Reading PLINK BIM file from [chr1_22_filtered.bim]... 7941687 SNPs to be included from BIM file(s). Reading quantitative covariates from [pc.txt]. 7 covariates of 140831 samples to be included. 34168 overlapping individuals with non-missing data to be included from the covariate file(s). Reading the sparse GRM file from [sp_grm]... After matching all the files, 34168 individuals to be included in the analysis. Estimating the genetic variance (Vg) by fastGWA-REML (grid search)... Iteration 1, step size: 0.0158408, logL: -16878.4. Vg: 0.0158408, searching range: 0 to 0.0316816 Iteration 2, step size: 0.0021121, logL: -16875.8. Vg: 0.0168968, searching range: 0.0147847 to 0.0190089 Iteration 3, step size: 0.000281614, logL: -16874.8. Vg: 0.0173193, searching range: 0.0170376 to 0.0176009 Iteration 4, step size: 3.75485e-05, logL: -16874.8. Vg: 0.0173005, searching range: 0.0172629 to 0.017338 Iteration 5, step size: 5.00647e-06, logL: -16874.7. Vg: 0.017333, searching range: 0.017328 to 0.017338 Iteration 6, step size: 6.67529e-07, logL: -16874.7. Vg: 0.0173354, searching range: 0.0173347 to 0.017336 Iteration 7, step size: 8.90039e-08, logL: -16874.7. Vg: 0.0173355, searching range: 0.0173354 to 0.0173356 Iteration 8, step size: 1.18672e-08, logL: -16874.7. Vg: 0.0173356, searching range: 0.0173356 to 0.0173356 Iteration 9, step size: 1.58229e-09, logL: -16874.7. Vg: 0.0173356, searching range: 0.0173356 to 0.0173356 Iteration 10, step size: 2.10972e-10, logL: -16874.7. Vg: 0.0173356, searching range: 0.0173356 to 0.0173356 Iteration 11, step size: 2.81296e-11, logL: -16874.7. Vg: 0.0173356, searching range: 0.0173356 to 0.0173356 Iteration 12, step size: 3.75062e-12, logL: -16874.7. Vg: 0.0173356, searching range: 0.0173356 to 0.0173356 Iteration 13, step size: 5.00082e-13, logL: -16874.7. Vg: 0.0173356, searching range: 0.0173356 to 0.0173356 fastGWA-REML converged. logL: -16874.7 Sampling variance/covariance of the estimates of Vg and Ve: 4.66793e-05 -5.16532e-05 -5.16532e-05 0.000114919

Source Variance SE Vg 0.0173356 0.00683223 Ve 0.972713 0.01072 Vp 0.990049

Heritability = 0.0175098 (Pval = 0.0111703) fastGWA-REML runtime: 2271.06 sec.

Tuning parameters using 2000 null SNPs... reading genotypes... 100% finished in 212.3 sec 1999 SNPs have been processed. Error: not enough valid null SNPs (<100). You may check if too variants are removed by a filter, e.g., MAF. An error occurs, please check the options or data

Thank you for your help.

longmanz commented 4 months ago

Hi, I think something went wrong with the sparse GRM. The row number in the grm.sp file is way too large given your sample size. Even for UK Biobank (European subset), the row number in the grm.sp file is ~600,000 - 700,000. Such high relatedness in your dataset seems unrealistic.

We recommend using only hapmap 3 common SNPs (after standard QC) to generate the GRM. In addition, please make sure the genetic ancestry of your data are homogeneous (i.e., the individuals used for GRM/GWAS analysis should come from the same genetic ancestry). Admixed individuals or individuals from a different genetic ancestry background should be analyzed separately, because we cannot generate an appropriate GRM when there are multiple ancestry backgrounds in the dataset.