jianyangqt / gcta

GCTA software
GNU General Public License v3.0
73 stars 23 forks source link

Estimating the genetic variance (Vg) by fastGWA-REML (grid search) no results for a long time #66

Closed DonaldSandoz2000 closed 6 months ago

DonaldSandoz2000 commented 6 months ago

Hello, I'm using fast-glmm developed by you guys, it seems to be a great tool, but I'm having a bit of trouble and would like your help. Here are my steps and log files: Firstly the bgen format was converted to a bed file using plink2 PLINK v2.00a6LM AVX2 AMD (12 Dec 2023) www.cog-genomics.org/plink/2.0/ (C) 2005-2023 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to /work1/analysis_ukb/ukb_bed/ukb_imp_chr13.log. Options in effect: --bgen /work1/ukb/ukb_bed/ukb_imp_chr13 --bgen /work1/ukb/ukb_imp_chr13_v3.bgen ref-first ---make-bed --memory 50000 --out /work1/analysis_ukb/ukb_bed/ukb_imp_chr13 ---sample /work1/ukb/ukb22828_c13_b0_v3_s487163.sample --threads 30

Start time: Sat Dec 30 23:19:42 2023 2060058 MiB RAM detected, ~1972982 available; reserving 50000 MiB for main workspace. Using up to 30 threads (change this with --threads). ---bgen: 3270217 variants detected, format v1.2. 487409 samples imported from .sample file to /work1/analysis_ukb/ukb_bed/ukb_imp_chr13-temporary.psam . --bgen: /work1/analysis_ukb/ukb_bed/ukb_imp_chr13-temporary.pgen + /work1/analysis_ukb/ukb_bed/ukb_imp_chr13-temporary.pvar written. 487409 samples (264224 females, 222939 males, 246 ambiguous; 487409 founders) loaded from /work1/analysis_ukb/ukb_bed/ukb_imp_chr13-temporary.psam. 3270217 variants loaded from /work1/analysis_ukb/ukb_bed/ukb_imp_chr13-temporary.pvar. Note: No phenotype data present. Writing /work1/analysis_ukb/ukb_bed/ukb_imp_chr13.fam ... done. Writing /work1/analysis_ukb/ukb_bed/ukb_imp_chr13.bim ... done. Writing /work1/analysis_ukb/ukb_bed/ukb_imp_chr13.bed ... done.

I then used the following QC operations PLINK v1.90b7.2 64-bit (11 Dec 2023) Options in effect. ---bfile /work1/analysis_ukb/ukb_bed/ukb_imp_chr13 --extract /work1/analysis_ukb/ukb_qc/rsidINFO.txt --geno 0.05 --hwe 1e-5 --keep-fam /work1/analysis_ukb/ukb_qc/sampleqc_id.txt ---maf 0.01 ---make-bed --memory 300000 --out /work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC --threads 32

Hostname: uranus Working directory. /work1/analysis_ukb/ukb_qc Working directory: /work1/yzy/analysis_ukb/ukb_qc Sun Dec 31 14:59:49 2023

Random number seed: 1704005989 2060058 MB RAM detected; reserving 300000 MB for main workspace. 3270217 variants loaded from .bim file. 487409 people (222939 males, 264224 females, 246 ambiguous) loaded from .fam. Ambiguous sex IDs written to /work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC.nosex . --extract: 1019508 variants remaining. Warning. At least 1597 duplicate IDs in --extract file. Warning: At least 1597 duplicate IDs in --extract file. 337056 people remaining. Using 1 thread (no multithreaded calculations invoked). Before main variant filters, 337056 founders and 0 nonfounders present. Calculating allele frequencies... done. Total genotyping rate in remaining samples is 0.99479. 20308 variants removed due to missing genotype data (--geno). --hwe. 2019 variants removed due to Hardy-Weinberg exact test. 676874 variants removed due to minor allele threshold(s) (---maf/-max-maf/-mac/-max-mac). 320307 variants and 337056 people pass filters and QC. Note: No phenotypes present. ---make-bed to /work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC.bed + /work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC.bim + /work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC.fam ... done.

Then we use the QC bed file to calculate the grm matrix, the log file is as follows:

Options:

--bfile /work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC --make-grm --sparse-cutoff 0.05 --thread-num 32 --out /work1/analysis_ukb/ukb_grm/chr13_grm/chr13_geno_grm

The program will be running with up to 32 threads. Note: GRM is computed using the SNPs on the autosomes. Reading PLINK FAM file from [/work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC.fam]... 337056 individuals to be included from FAM file. 337056 individuals to be included. 156073 males, 180983 females, 0 unknown. Reading PLINK BIM file from [/work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC.bim]... 320307 SNPs to be included from BIM file(s). Computing the genetic relationship matrix (GRM) v2 ... Subset 1/1, no. subject 1-337056 337056 samples, 320307 markers, 56803542096 GRM elements IDs for the GRM file have been saved in the file [/work1/analysis_ukb/ukb_grm/chr13_grm/chr13_geno_grm.grm.id] Computing GRM... 5.1% Estimated time remaining 812.0 min 10.2% Estimated time remaining 758.1 min 15.3% Estimated time remaining 713.8 min 20.5% Estimated time remaining 667.8 min 25.6% Estimated time remaining 616.5 min 30.7% Estimated time remaining 574.6 min 35.8% Estimated time remaining 531.9 min 40.9% Estimated time remaining 487.4 min 46.0% Estimated time remaining 445.0 min 51.2% Estimated time remaining 401.6 min 56.3% Estimated time remaining 358.8 min 61.4% Estimated time remaining 316.5 min 66.5% Estimated time remaining 274.3 min 71.6% Estimated time remaining 232.1 min 76.7% Estimated time remaining 190.3 min 81.8% Estimated time remaining 148.4 min 87.0% Estimated time remaining 106.5 min 92.1% Estimated time remaining 64.7 min 97.2% Estimated time remaining 23.0 min 100% finished in 49093.6 sec 320307 SNPs have been processed. Used 320307 valid SNPs. The GRM computation is completed. Saving sparse GRM with a cutoff 0.050000... GRM has been saved in the file [/work1/analysis_ukb/ukb_grm/chr13_grm/chr13_geno_grm.grm.sp]

In the end, we use this grm matrix for fast-glmm analysis and the log file is as follows:


Options:

--bfile /work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC --grm-sparse /work1/analysis_ukb/ukb_grm/chr13_grm/chr13_geno_grm --fastGWA-mlm --pheno /work1/analysis_ukb/gwas/calcium/calcium.txt --thread-num 20 --out chr13_cal

The program will be running with up to 20 threads. Reading PLINK FAM file from [/work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC.fam]... 337056 individuals to be included from FAM file. Reading phenotype data from [/work1/analysis_ukb/gwas/calcium/calcium.txt]... 337056 overlapping individuals with non-missing data to be included from the phenotype file. 337056 individuals to be included. 156073 males, 180983 females, 0 unknown. Reading PLINK BIM file from [/work1/analysis_ukb/ukb_qc/chr13_QC/chr13_QC.bim]... 320307 SNPs to be included from BIM file(s). Reading the sparse GRM file from [/work1/analysis_ukb/ukb_grm/chr13_grm/chr13_geno_grm]... After matching all the files, 337056 individuals to be included in the analysis. Estimating the genetic variance (Vg) by fastGWA-REML (grid search)...

It runs for two or three hours without producing any results and consumes almost a T of memory, which is seriously out of line with what the article describes, I'm not sure what I'm doing wrong, hopefully you guys can shed some light on this. Additional info: chr13_geno_grm.grm.sp has line 745108110 , chr13_geno_grm.grm.id has line 337056.

longmanz commented 6 months ago

Hi, The difference of line number between your .sp and .id file indicates you have a huge number of related pairs in your data, which is not true for UK Biobank. Please make sure your individuals are from the same genetic ancestry background, because the genetic relationship of individuals of diverse ancestry background cannot be properly estimated by GCTA as well as other commonly used tools. You may refer to UKB data field 22006 for individuals of Caucasian ancestry. Or you can infer the genetic ancestry using references like the 1000 Genomes Project.