weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

What would be the reason of getting empty results files after step 2 for SAIGE-GENE? #330

Closed mferreiradasilva closed 2 years ago

mferreiradasilva commented 3 years ago

Sorry for opening this, when it's not really an issue, more something I do not understand.

After running the SPA test for SAIGE-GENE, it runs through all the genes in the group file and returns "no markers are left" for all of them, which, at the end, results in an empty file.

For the record, I'm running the single-variant analysis as well in the function, and this file also returns empty, when it does give results if just running normal SAIGE (without a sparse matrix and without a group file).

What would be the reason for this?

weizhouUMICH commented 3 years ago

Hi @mferreiradasilva, Have the markers been sucessfully read from the dosage files? Would you mind sharing the first several lines from the log file (terminal output)? Are you using bgen or vcf file? If using vcf, can you check if you have specified the --chr?

Thanks, Wei

mferreiradasilva commented 3 years ago

Hi @weizhouUMICH,

This is a VCF.

Here's the command I used:

SPAGMMATtest(vcfFile="final_vcf.recode.vcf.gz", vcfFileIndex="final_vcf.recode.vcf.gz.csi", vcfField="GT", chrom="1", GMMATmodelFile="sparse_model_3.rda", varianceRatioFile="sparse_model_3.varianceRatio.txt", SAIGEOutputFile="results_group", IsSparse=T, sparseSigmaFile="sparse_model_3.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx", groupFile="groupFile2.txt", IsSingleVarinGroupTest=T, IsAccountforCasecontrolImbalanceinGroupTest=T)

And the first few lines of the output:

group-based test will be performed
Any dosages <=  0.2  for genetic variants with MAC <= 10 are set to be 0 in group tests
Garbage collection 91 = 62+10+19 (level 2) ...
85.5 Mbytes of cons cells used (34%)
72.6 Mbytes of vectors used (31%)
753  samples have been used to fit the glmm null model
Leave chromosome  1  out will be applied
variance Ratio is  1.01213 25.67299 1 1.034423 1.000051 1 1 0.9241374
isCondition is  FALSE
Open VCF done
To read the field GT
Number of meta lines in the vcf file (lines starting with ##): 3494
Number of samples in the vcf file: 753
753  sample IDs are found in the vcf file
[1] 753   4
[1] "IID"          "IndexInModel" "IndexDose.x"  "IndexDose.y"
753  samples were used in fitting the NULL glmm model and are found in sample file
sparse kinship matrix is going to be used
sparseSigmaFile:  sparse_model_3.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx
Missing dosages will be mean imputed for the analysis
Analysis started at  1618423001 Seconds
minMAC:  0.5
minMAF:  0
Minimum MAF of markers to be tested is  0.0003320053
It is a binary trait
Analyzing  672  cases and  81  controls
isCondition is  FALSE
Analysis started at  1618423001 Seconds
genetic variants with  0.0003320053 <= MAF <=  0.5 are included for gene-based tests
It is a binary trait
Case-control imbalance is adjusted for binary traits.
Ultra rare variants won't be collpased for set-based association tests
isCondition is  FALSE
geneID:  MFSD14A
std::size_t sample_size = marker_file.samples().size();753
cntMarker:  0
[1] "No markers are left!"

When step 2 is done, I also get this warning:

In .Internal(gc(verbose, reset, full)) :
  closing unused connection 3 (groupFile2.txt)
weizhouUMICH commented 3 years ago

Hi,

Could you please check if the chrome column in the VCF file is "1" instead of "chr1"?
Is the format of variant IDs in the group file correct? https://github.com/weizhouUMICH/SAIGE/blob/master/extdata/input/groupFile_geneBasedtest.txt

Thanks, Wei

mferreiradasilva commented 3 years ago

Hi, @weizhouUMICH

I re-checked both files. Yeah the VCF has chromosome notation as "1", not "chr1", and the group file is correctly formatted.

mferreiradasilva commented 3 years ago

Hi @weizhouUMICH

Do you have any idea why this could be happening?

Can it be having trouble reading the SNPs from the VCF (which is weird because it works on single-variant analysis)? Or most likely the group file?

weizhouUMICH commented 3 years ago

Hi @mferreiradasilva,

One possibility is field GT. Does VCF file have the filed GT? Sometimes, only DS for dosages is included.

Thanks, Wei

mferreiradasilva commented 3 years ago

Hi @weizhouUMICH

Yes, it's definitely GT.

complexgenome commented 3 years ago

@weizhouUMICH

Sorry for bumping this issue. I have a concern with group file using a VCF format:

In the gene group file, data are: seedNum_126822 chr1:32373_A/C chr1:32374_A/C

Should the chromosome in the VCF also be chr1?

Or, if the chromosome VCF are chr1, the SNP names can have 1:32374_A/C in group file?

weizhouUMICH commented 2 years ago

Sorry fo the late reply! We have just released a new version 1.0.0. It has substantial computational efficiency improvements for both Step 1 and Step 2 for single-variant and set-based tests and clearer log output. We have created a new program github page https://github.com/saigegit/SAIGE with the documentation provided https://saigegit.github.io/SAIGE-doc/ The program will be maintained by multiple SAIGE developers there. The docker image has been updated. Please feel free to try the version 1.0.0 and report issues if any.

Thanks! Wei