jianyangqt / gcta

GCTA software
GNU General Public License v3.0
73 stars 23 forks source link

input error: invalid option for "--bgen" or "--mbgen" #38

Closed zqHealth closed 6 months ago

zqHealth commented 1 year ago

I was testing the GCTA GREML-LDMS function. An error happened while calculating segment based LD score.

1) input BGEN v1.2 file was successfully created by command: qctool_v2.2.0/qctool \ -g t11.11.vcf.gz \ -ofiletype bgen_v1.2 \ -bgen-bits 8 \ -bgen-compression zstd \ -vcf-genotype-field GP \ -og test/BGEN/t11.11.bgen \ -os test/BGEN/t11.11.sample BGEN/bgenix \ -g test/BGEN/t11.11.bgen \ -index

2) GCTA GREML-LDMS command: gcta-1.94.1/gcta \ --bgen test/BGEN/t11.11.bgen \ --sample test/BGEN/t11.11.sample \ --ld-score-region 200 \ --threads 1 \ --out ttt

The Error Message: Accepted options: Error: invalid option "--bgen". An error occurs, please check the options or data.

3) Here are some variants in bgen file by using commend: BGEN-703a453117/bin/bgenix -g test/BGEN/t11.11.bgen -list 1 # bgenix: started 2023-03-22 20:43:24 2 alternate_ids rsid chromosome position number_of_alleles first_allele alternative_alleles 3 11:25000161:C:T 11:25000161:C:T chr11 25000161 2 C T 4 11:25000340:T:C 11:25000340:T:C chr11 25000340 2 T C 5 11:25000581:T:C 11:25000581:T:C chr11 25000581 2 T C 6 11:25000626:T:C 11:25000626:T:C chr11 25000626 2 T C 7 11:25000803:G:A 11:25000803:G:A chr11 25000803 2 G A

Similar error message for "--mbgen". Please help, thanks :)

longmanz commented 1 year ago

Hi, Not sure if this is a problem with "--ld-score-region" or "--bgen". Could you try to use genotype in plink BED format and see if this issue persists?

zqHealth commented 1 year ago

Thank you for you reply, Longda. PLINK BED format dose work. It seems BGEN format is not supported in release v1.94.1.

1) convert VCF to BED format:

PLINK v2.00a3.3LM 64-bit Intel (3 Jun 2022) www.cog-genomics.org/plink/2.0/ (C) 2005-2022 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to t11.11.log. Options in effect: --double-id --make-bed --memory 8000 --new-id-max-allele-len 50 truncate --out t11.11 --rm-dup force-first --set-all-var-ids @:#:$r:$a --threads 1 --vcf test/test_input_vcf/test.3M.chr11.vcf.gz dosage=HDS --vcf-half-call missing

Start time: Fri Mar 24 10:17:36 2023 515375 MiB RAM detected; reserving 8000 MiB for main workspace. Using 1 compute thread. Warning: No FORMAT/HDS key found in --vcf file header. Dosages will be imported (from FORMAT/DS), but phase information will be limited or absent. --vcf: 62999 variants scanned. --vcf: t11.11-temporary.pgen + t11.11-temporary.pvar.zst + t11.11-temporary.psam written. 1201 samples (0 females, 0 males, 1201 ambiguous; 1201 founders) loaded from t11.11-temporary.psam. 62999 variants loaded from t11.11-temporary.pvar.zst. Note: No phenotype data present. Note: Skipping --rm-dup since no duplicate IDs are present. Writing t11.11.fam ... done. Writing t11.11.bim ... done. Writing t11.11.bed ... done. End time: Fri Mar 24 10:17:40 2023

2) GCTA GREML-LDMS:


Accepted options: --bfile t11.11 --ld-score-region --threads 1 --out ttt

Note: This is a multi-thread program. You could specify the number of threads by the --thread-num option to speed up the computation if there are multiple processors in your machine.

Reading PLINK FAM file from [t11.11.fam]. 1201 individuals to be included from [t11.11.fam]. Reading PLINK BIM file from [t11.11.bim]. 62999 SNPs to be included from [t11.11.bim]. Reading PLINK BED file from [t11.11.bed] in SNP-major format ... Genotype data for 1201 individuals and 62999 SNPs to be included from [t11.11.bed].

Calculating LD score between SNPs (block size of 10000Kb with an overlap of 5000Kb between blocks); LD rsq threshold = 0) ... Calculating allele frequencies ... Calculating regional mean LD score (region width = 200Kb with an overlap of 100Kb between regions) ... Writing the regional LD score to file [ttt.score.ld] ...

Analysis finished at 10:19:14 CST on Fri Mar 24 2023 Overall computational time: 1 minute 34 sec.

longmanz commented 1 year ago

Hi @zqHealth, Thank you for confirming this! Okay so for some reasons the LDMS is not compatible with BGEN format at the moment. Please consider using BED for your LDMS analyses at the moment.