weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
187 stars 72 forks source link

RHS 'b' has wrong number of rows: gene based tests #361

Closed complexgenome closed 3 years ago

complexgenome commented 3 years ago

Hello there,

I use command to run group based gene association using saige

step2_SPAtests.R --vcfFile=../VCF_RSQ80/CHR22_MHAS_rsq80_MAC10.recode.vcf.gz --vcfFileIndex=../VCF_RSQ80/CHR22_MHAS_rsq80_MAC10.recode.vcf.gz.tbi \
--vcfField=DS --chrom=chr22 --minMAF=0 --minMAC=0.5 --maxMAFforGroupTest=0.01 --sampleFile=./again_imputedRSQ80_sparse_matrix/sparseGRM_MHAS_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt --GMMATmodelFile=./again_imputedRSQ80_OUTPUT_GENEBASED/MHAS_null.rda \
--varianceRatioFile=again_imputedRSQ80_OUTPUT_GENEBASED/variance_MHAS.varianceRatio.txt \
--SAIGEOutputFile=CHR22_saige_genebased --numLinesOutput=1 --groupFile=SAIGE_group_files/CHR22_saige_annovar_group_rsq80.txt --sparseSigmaFile=again_imputedRSQ80_sparse_matrix/sparseGRM_MHAS_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx \
--IsSingleVarinGroupTest=FALSE --LOCO=FALSE --IsAccountforCasecontrolImbalanceinGroupTest=TRUE

Messages:


 R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /mnt/mfs/hgrcgrid/homes/ss5505/conda/my-envs/saige/lib/libopenblasp-r0.3.15.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] SAIGE_0.44.5

loaded via a namespace (and not attached):
[1] compiler_4.0.3     Matrix_1.3-2       Rcpp_1.0.6         grid_4.0.3
[5] RcppParallel_5.1.4 lattice_0.20-44
$vcfFile
[1] "../VCF_RSQ80/CHR22_MHAS_rsq80_MAC10.recode.vcf.gz"

$vcfFileIndex
[1] "../VCF_RSQ80/CHR22_MHAS_rsq80_MAC10.recode.vcf.gz.tbi"

$vcfField
[1] "DS"

$bgenFile
[1] ""

$bgenFileIndex
[1] ""

$savFile
[1] ""

$savFileIndex
[1] ""

$idstoExcludeFile
[1] ""

$idstoIncludeFile
[1] ""

$rangestoExcludeFile
[1] ""

$rangestoIncludeFile
[1] ""

$chrom
[1] "chr22"

$start
[1] 1

$end
[1] 2.5e+08

$IsDropMissingDosages
[1] FALSE

$minMAF
[1] 0

$minMAC
[1] 0.5

$maxMAFforGroupTest
[1] 0.01

$minInfo
[1] 0

$sampleFile
[1] "./again_imputedRSQ80_sparse_matrix/sparseGRM_MHAS_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt"

$GMMATmodelFile
[1] "./again_imputedRSQ80_OUTPUT_GENEBASED/MHAS_null.rda"

$varianceRatioFile
[1] "again_imputedRSQ80_OUTPUT_GENEBASED/variance_MHAS.varianceRatio.txt"

$SAIGEOutputFile
[1] "CHR22_saige_genebased"

$numLinesOutput
[1] 1

$IsSparse
[1] TRUE

$SPAcutoff
[1] 2

$IsOutputAFinCaseCtrl
[1] FALSE

$IsOutputNinCaseCtrl
[1] FALSE

$IsOutputHetHomCountsinCaseCtrl
[1] FALSE

$LOCO
[1] FALSE

$condition
[1] ""

$sparseSigmaFile
[1] "again_imputedRSQ80_sparse_matrix/sparseGRM_MHAS_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx"

$groupFile
[1] "SAIGE_group_files/CHR22_saige_annovar_group_rsq80.txt"

$kernel
[1] "linear.weighted"

$method
[1] "optimal.adj"

$weights.beta.rare
[1] "1,25"

$weights.beta.common
[1] "1,25"

$weightMAFcutoff
[1] 0.01

$r.corr
[1] "0"

$IsSingleVarinGroupTest
[1] FALSE

$IsOutputMAFinCaseCtrlinGroupTest
[1] FALSE

$cateVarRatioMinMACVecExclude
[1] "0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5"

$cateVarRatioMaxMACVecInclude
[1] "1.5,2.5,3.5,4.5,5.5,10.5,20.5"

$dosageZerodCutoff
[1] 0.2

$IsOutputPvalueNAinGroupTestforBinary
[1] FALSE

$IsAccountforCasecontrolImbalanceinGroupTest
[1] TRUE

$weightsIncludeinGroupFile
[1] FALSE

$IsOutputBETASEinBurdenTest
[1] FALSE

$sampleFile_male
[1] ""

$X_PARregion
[1] ""

$is_rewrite_XnonPAR_forMales
[1] FALSE

$method_to_CollapseUltraRare
[1] ""

$MACCutoff_to_CollapseUltraRare
[1] 10

$DosageCutoff_for_UltraRarePresence
[1] 0.5

$help
[1] FALSE

weights.beta.rare  is  1 25
weights.beta.common  is  1 25
cateVarRatioMinMACVecExclude  is  0.5 1.5 2.5 3.5 4.5 5.5 10.5 20.5
cateVarRatioMaxMACVecInclude  is  1.5 2.5 3.5 4.5 5.5 10.5 20.5
group-based test will be performed
Any dosages <=  0.2  for genetic variants with MAC <= 10 are set to be 0 in group tests
Garbage collection 14 = 8+2+4 (level 2) ...
82.1 Mbytes of cons cells used (54%)
21.9 Mbytes of vectors used (34%)
7127  samples have been used to fit the glmm null model
[1] "Leave-one-chromosome-out is not applied"
variance Ratio is  1 1 1 1 1 1 1 1
7234  sample IDs are found in sample file
isCondition is  FALSE
Open VCF done
To read the field DS
Number of meta lines in the vcf file (lines starting with ##): 18
Number of samples in the vcf file: 7234
[1] 7234    4
[1] "IID"          "IndexInModel" "IndexDose.x"  "IndexDose.y"
7127  samples were used in fitting the NULL glmm model and are found in sample file
sparse kinship matrix is going to be used
sparseSigmaFile:  imputedRSQ80_sparse_matrix/sparseGRM_MHAS_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx
Missing dosages will be mean imputed for the analysis
Analysis started at  1627491213 Seconds
minMAC:  0.5
minMAF:  0
Minimum MAF of markers to be tested is  3.507787e-05
It is a binary trait
Analyzing  444  cases and  6683  controls
isCondition is  FALSE
Analysis started at  1627491213 Seconds
genetic variants with  3.507787e-05 <= MAF <=  0.01 are included for gene-based tests
It is a binary trait
Case-control imbalance is adjusted for binary traits.
Ultra rare variants won't be collpased for set-based association tests
isCondition is  FALSE
geneID:  DUXAP8
std::size_t sample_size = marker_file.samples().size();7234
cntMarker:  66
isCondition is  FALSE
weights:  24.44184 24.3325 24.33226 24.22515 24.2191 24.38925 24.43027 24.43348 24.22053 24.2202 24.43533 24.43027 24.49825 24.41602 24.2202 24.40294 24.49813 24.51874 24.49838 24.40294 24.26258 24.26054 24.26009 24.22118 22.17921 23.95023 23.82177 22.77512 22.81812 24.19449 23.4672 22.37496 23.9729 24.62664 24.55926 23.47306 24.52361 22.3466 24.62664 24.29365 21.93966 20.29242 24.52151 23.80739 24.1856 23.99184 23.97743 23.43904 24.54664 22.85984 24.21567 20.50478 22.85896 24.55959 24.55297 24.55367 24.40981 20.88923 23.7827 23.80671 24.46801 23.80731 24.53916 22.36524 23.95653 23.53297
[1] "DEBUG1"
[1] "DEBUG2"
[1] "DEBUG3"
Error in .solve.dgC.lu(as(a, "dgCMatrix"), b = b, tol = tol) :
  RHS 'b' has wrong number of rows:7127
Calls: SPAGMMATtest ... solve -> solve -> .local -> solve.dsC.mat -> .solve.dgC.lu
In addition: Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
Timing stopped at: 1.532 2.348 1.036
Execution halted

Error

Error in .solve.dgC.lu(as(a, "dgCMatrix"), b = b, tol = tol) :
  RHS 'b' has wrong number of rows:7127
Calls: SPAGMMATtest ... solve -> solve -> .local -> solve.dsC.mat -> .solve.dgC.lu
In addition: Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
Timing stopped at: 1.532 2.348 1.036
Execution halted

Before running this, I create kinship and fit null model


createSparseGRM.R --plinkFile=/mnt/mfs/hgrcgrid/shared/MHAS/data/GENOTYPED/uplift_genotyped/imputated_plink/TOPMED_merged_allCHRs_RSQ80_MAF1_MAC1_updated_ids_merged_genotyped_LDpruned --nThreads=4  --outputPrefix=again_imputedRSQ80_sparse_matrix/sparseGRM_MHAS --relatednessCutoff=0.125 --memoryChunk=2

step1_fitNULLGLMM.R --plinkFile=/mnt/mfs/hgrcgrid/shared/MHAS/data/GENOTYPED/uplift_genotyped/imputated_plink/TOPMED_merged_allCHRs_RSQ80_MAF1_MAC1_updated_ids_merged_genotyped_LDpruned  \
--phenoFile=/mnt/mfs/hgrcgrid/shared/MHAS/PHENO/phenotype_MHAS_PCcleaned_04.28.2021.txt \
--phenoCol=AD_PHENO_FINAL_TRUE_MEX_noDUP_PCcleaned \
--sexCol=SEX --covarColList=SEX,PC1,PC2,PC3,AGE \
--sampleIDColinphenoFile=TOPMED_IID --traitType=binary \
--invNormalize=FALSE \
--outputPrefix=again_imputedRSQ80_OUTPUT_GENEBASED/MHAS_null \
--outputPrefix_varRatio=again_imputedRSQ80_OUTPUT_GENEBASED/variance_MHAS \
--sparseGRMFile=again_imputedRSQ80_sparse_matrix/sparseGRM_MHAS_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx \
--sparseGRMSampleIDFile=again_imputedRSQ80_sparse_matrix/sparseGRM_MHAS_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \
--nThreads=4 --LOCO=FALSE --skipModelFitting=FALSE --IsSparseKin=TRUE --IsOverwriteVarianceRatioFile=TRUE --isCateVarianceRatio=TRUE --MaleCode=0 --FemaleCode=1

The phenotype has missing value for some individuals. Out of 7234, 7127 has phenotype. That makes sense when the number doesn't match.

I was able to run the single-marker test without any issues using same code, same phenotype except group-based file.

@weizhouUMICH can you please assist?

complexgenome commented 3 years ago

Dear @weizhouUMICH - can you please assist?

complexgenome commented 3 years ago

Found the solution. I was using GRM file (step0) instead of the sigma-sparse file output generated while fitting the null model.

ykjiang123 commented 2 years ago

I have the same question with you,can you please tell me more details to solve this error? thanks a lot