Closed EstelLec closed 2 years ago
Hi,
What does the output mask snplist file look like? (do you only have masks listed for a single gene?)
Cheers, Joelle
Hi Joelle, It is empty when output is empty. When there is one gene in the output, it looks like :
GPATCH4(ENSG00000160818).M1.0.001 1:156595257:A:AAC
GPATCH4(ENSG00000160818).M1.0.01 1:156595257:A:AAC
GPATCH4(ENSG00000160818).M1.0.05 1:156595257:A:AAC
GPATCH4(ENSG00000160818).M1.0.1 1:156595257:A:AAC
but the log shows 1970 sets in total.
Can you use --check-burden-files
to check if there is any name format issue between the input files with the genetic data?
Hello Joelle, it seems there is no trouble between snp name format, as missing reported snp are from other chromosome or doesn’t pass QC (not in --extract WES_c${chr}_snps_qc_pass.snplist). I join an example of log, but I didn’t notice anything weird in it.
Options in effect:
--step 2 \
--bed ukb23149_c22_b0_v1 \
--out assoc_gene.c22 \
--ref-first \
--check-burden-files \
--phenoFile All_SBP_DBP.phe \
--covarFile All_SBP_DBP.phe \
--extract WES_c22_snps_qc_pass.snplist \
--phenoCol SBPZvalues \
--phenoCol DBPZvalues \
--covarCol Age \
--catCovarList Eth,Sex \
--covarCol PC{1:10} \
--pred All_500k_results_pred.list \
--bsize 200 \
--minMAC 3 \
--threads 16 \
--gz \
--set-list ukb23149_450k_OQFE.sets.txt.gz \
--anno-file ukb23149_450k_OQFE.annotations.txt.gz \
--mask-def ukb23149_450k_OQFE.masks \
--aaf-bins 0.1,0.05,0.01,0.001 \
--write-mask-snplist
Association testing mode (joint tests) with fast multithreading using OpenMP
* bim : [ukb23149_c22_b0_v1.bim] n_snps = 605098
-keeping variants specified by --extract
-number of variants remaining in the analysis = 590705
* fam : [ukb23149_c22_b0_v1.fam] n_samples = 454756
* bed : [ukb23149_c22_b0_v1.bed]
* phenotypes : [All_SBP_DBP.phe] n_pheno = 2
-number of phenotyped individuals = 452754
* covariates : [All_SBP_DBP.phe] n_cov = 13
-number of individuals with covariate data = 453860
* number of individuals used in analysis = 452385
* LOCO predictions : [All_500k_results_pred.list] n_files = 2
-file [/home/dnanexus/out/out/All_500k_results_1.loco.gz] for phenotype 'SBPZvalues'
-file [/home/dnanexus/out/out/All_500k_results_2.loco.gz] for phenotype 'DBPZvalues'
-residualizing and scaling phenotypes...done (16ms)
* annotations : [ukb23149_450k_OQFE.annotations.txt.gz]
+number of annotations categories = 6
* masks : [ukb23149_450k_OQFE.masks] n_masks = 1
* aaf cutoffs : [4 : 0.001 0.01 0.05 0.1 ] + singletons
* set file : [ukb23149_450k_OQFE.sets.txt.gz] n_sets = 423
WARNING: Detected 384 sets with variants not in genetic data or annotation files.
WARNING: Detected 18562 sets with only unknown variants (these are ignored).
+report on burden input files written to [assoc_gene.c22_masks_report.txt]
* # threads : [16]
* # tested sets : [423]
* max block size : [200]
* rule used to build masks : max
* approximate memory usage : 900MB
* writing list of variants for each mask in file [assoc_gene.c22_masks.snplist]
* using minimum MAC of 3 (masks with lower MAC are ignored)
Chromosome 22 [423 sets in total]
-reading loco predictions for the chromosome...done (2437ms)
set [1/423] : 14 variants...
-reading in genotypes and building masks...done (21ms)
set [2/423] : 16 variants...
-reading in genotypes and building masks...done (18ms)
set [3/423] : 13 variants...
-reading in genotypes and building masks...done (17ms)
set [4/423] : 15 variants...
-reading in genotypes and building masks...done (17ms)
set [5/423] : 16 variants...
-reading in genotypes and building masks...done (26ms)
.......
Association results stored separately for each trait in files :
* [assoc_gene.c22_SBPZvalues.regenie.gz]
* [assoc_gene.c22_DBPZvalues.regenie.gz]
Number of ignored tests due to low MAC : 0
the mask.snplist is empty, and there is only the header in the output.
Cheers, Estelle
Hi Joelle, I figured it out. The option --ref-first wasn't relevant for me. My bad ! Thank you for your help. Estelle
@EstelLec how did you fixed the problem? I have the same issue
Hello, I am running Regenie step 2 to do burden testing using the UK biobank data WES 450k on UKB RAP (regenie v2.2.4) and I would need your help. I run this command :
And the log shows 307 sets running :
But there is only one gene in my output. It seems it runs on 307 genes, am I wrong ? I have the same problem for all chromosomes, sometimes there is no gene at all in the output. How could I correct this ? Thank you