weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
187 stars 71 forks source link

Non-conformable argument issue #115

Closed randomNumber298347 closed 4 years ago

randomNumber298347 commented 4 years ago

Hello,

Using the latest version of SAIGE-GENE (0.35.8.8) I am getting the following error when running Step 2:

Error in G2_adj_n %*% diag(rep(1/r, dim(G2_adj_n)[2])) : non-conformable arguments Calls: SPAGMMATtest ... SAIGE_SKAT_withRatioVec -> Related_ER -> \<Anonymous> -> nrow -> as.matrix

Any thoughts on what may be causing this?

BW.

weizhouUMICH commented 4 years ago

Hello,

Happy to help! Would you mind sharing more information? How many variants are there in the tested gene? What did the log file output for the gene?

Thanks, Wei

randomNumber298347 commented 4 years ago

Hi,

I think it might have something to do with missing phenotypes in my file. Can you say how/if SAIGE-GENE handles missing phenotypes as I can not seem to find any information on the main page re. this.

For example, if I have 10000 individuals in the genetic data, but only 8000 have a binary phenotype, are the remaining 2000 ignored? What if carriers of rare alleles are only in the 2000? How does SAIGE-GENE deal with this? If I try creating a kinship matrix in 10000 individuals then SAIGE-GENE seems to fall over if rare-allele carriers do not have a phenotype. If I try creating kinship matrix within the 8000 it seem to run through.

Are there some guidelines I should be following to ensure consistency between what is included when creating the GRM, phenotype files, running the burden testing?

weizhouUMICH commented 4 years ago

Hi,

The phenotype file is written using fread in the data.table library. Then any sample with missing data in any covariate or phenotype is removed using the following code, so the log file, you will find "XX sample have non-missing phenotypes"

formula.null = as.formula(formula) mmat = model.frame(formula.null, data, na.action=NULL) mmat$IID = data[,which(sampleIDColinphenoFile == colnames(data))] mmat_nomissing = mmat[complete.cases(mmat),] mmat_nomissing$IndexPheno = seq(1,nrow(mmat_nomissing), by=1) cat(nrow(mmat_nomissing), " samples have non-missing phenotypes\n")

Thanks, Wei

ruthchia commented 4 years ago

Hi Wei, I have the same error message when running SAIGE-GENE using the latest build. I ran step2 for all chromosomes on > 500 genes/chromosome in parallel and some was successful and some failed.

The code I ran is: Rscript ../scripts/SAIGE/extdata/step2_SPAtests.R \ --vcfFile=../vcf/FILTERED.LBD.controls.noDuplicates.keepRelated.hwe1e-10_chr20.vcf.gz \ --vcfFileIndex=../vcf/FILTERED.LBD.controls.noDuplicates.keepRelated.hwe1e-10_chr20.vcf.gz.tbi \ --vcfField=GT \ --chrom=20\ --minMAF=0 \ --minMAC=0.5 \ --sampleFile=../sampleIID_VCF.txt \ --maxMAFforGroupTest=0.01 \ --GMMATmodelFile=./output/LBD.controls.noDuplicates.keepRelated.rda \ --varianceRatioFile=./output/LBD.controls.noDuplicates.keepRelated_cate.varianceRatio.txt \ --SAIGEOutputFile=./output/nonSYN.CADD12/LBD.controls.noDuplicates.keepRelated.SAIGE.gene.CODING.nonSYN.chr20.txt \ --numLinesOutput=1 \ --groupFile=./GeneSNPs/nonSYN.CADD12/chr20.nonSYN.CADD12.SAIGEgeneInputSetv0.txt \ --sparseSigmaFile=./output/LBD.controls.noDuplicates.keepRelated_cate.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx \ --IsOutputAFinCaseCtrl=TRUE \ --IsSingleVarinGroupTest=TRUE

The error is: Error in G2_adj_n %*% diag(rep(1/r, dim(G2_adj_n)[2])) : non-conformable arguments Calls: SPAGMMATtest ... SAIGE_SKAT_withRatioVec -> Related_ER -> -> nrow -> as.matrix Timing stopped at: 0.155 0 0.155 Execution halted

For example, the one that failed when execution halted was at this gene set: PRPF6 20:63999619_C/T 20:64027738_T/G

What do you think is the problem? If useful, I can email you the geneInput set and results from Step0 and Step1. Please let me know which email to send it to.

Thanks! Ruth

weizhouUMICH commented 4 years ago

Hi Ruth,

Thank you very much for reporting this issue. Would you mind sharing more details from the log file for this gene? It would be also extremely helpful if you email us the input for step 2 and results from step 0 and step 1 (Zhangchen Zhao zczhao@umich.edu and myself zhowei@umich.edu).

Thank you! Wei

nkgx commented 4 years ago

I encountered the same problem a while ago. I don't have any example input handy but it only seemed to occur for variants with very low allele frequency (eg. MAC 1) and also only for binary phenotypes. So maybe something about unexpected zeros or a data frame becoming one-dimensional?

ruthchia commented 4 years ago

Hey Nils, thanks for sharing your experience. What is your workaround to get it to work? Did you remove SNPs that had MAC < 2 prior to running SAIGE-GENE? Thanks! Ruth

nkgx commented 4 years ago

Yes, my workaround was to remove SNPs with very low MAC (I used < 3) from the input. That's not a very useful workaround though since the whole point of running SAIGE-GENE was to test these kinds of ultra-rare SNPs.

ruthchia commented 4 years ago

Hi Wei, Sorry it took me long to get back to you. I would really want to send you the files, but the input vcf files for step2 are huge. What would you suggest? Would results from step 0 and step 1 be sufficient for now for you to figure out what went wrong? Thanks Ruth

From: weizhouUMICH notifications@github.com Reply-To: weizhouUMICH/SAIGE reply@reply.github.com Date: Thursday, October 24, 2019 at 11:40 AM To: weizhouUMICH/SAIGE SAIGE@noreply.github.com Cc: "Chia, Ruth (NIH/NIA/IRP) [E]" ruth.chia@nih.gov, Comment comment@noreply.github.com Subject: Re: [weizhouUMICH/SAIGE] Non-conformable argument issue (#115)

Hi Ruth,

Thank you very much for reporting this issue. Would you mind sharing more details from the log file for this gene? It would be also extremely helpful if you email us the input for step 2 and results from step 0 and step 1 (Zhangchen Zhao zczhao@umich.edumailto:zczhao@umich.edu and myself zhowei@umich.edumailto:zhowei@umich.edu).

Thank you! Wei

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/weizhouUMICH/SAIGE/issues/115?email_source=notifications&email_token=AE6FBYMWQB7OMZUBKHNWY33QQG4NJA5CNFSM4I4LHRM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECFNFPI#issuecomment-545968829, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AE6FBYIGZL56566MEKQYKFTQQG4NJANCNFSM4I4LHRMQ.

randomNumber298347 commented 4 years ago

Hi,

Just thought I would follow up given the conversation above.

I still have not been able to get SAIGE-GENE to work across all transcripts I am testing having tried so many things (re-installs, library updates, checking phenotype/genotype files etc.).

Note that I had a similar observation to Nills in that if I set a min MAC to 2 then everything worked fine - but this is far from ideal as pointed out above.

Would like to use this program so let me know if I can provide any additional information that may be of assistance.

Thanks.

randomNumber298347 commented 4 years ago

Hi - just wondered if there was any update on this - or if we can expect an update? Thanks,

weizhouUMICH commented 4 years ago

Hi- The bug should have been fixed in the version 0.36.1. Please have a try to see if it works. Note that the version 0.36.1 will be updated to 0.36.2 later today with a few other issues fixed.

Thanks, Wei

On Fri, Nov 22, 2019 at 3:40 AM randomNumber298347 notifications@github.com wrote:

Hi - just wondered if there was any update on this - or if we can expect an update? Thanks,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/weizhouUMICH/SAIGE/issues/115?email_source=notifications&email_token=ACL52L2QP4JSMAVTOUT76XLQU6LGNA5CNFSM4I4LHRM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE46TUA#issuecomment-557443536, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACL52L37YXAMZ3GEF2TAMK3QU6LGNANCNFSM4I4LHRMQ .