Open SheaCheng2000 opened 2 years ago
Hi, It means that some samples used in step 1 are not in the VCF file. Can you have a check?
Thanks, Wei
That's exactly the problem! Thank you so much! Although no more errors are reported, there is a new problem--the log of step2 shows lines of:
indexChunk is 0 indexChunk 0 nRegions 18426 Read in 100 region(s) from the group file. The chunk is empty Read in 100 region(s) from the group file. The chunk is empty ... Read in 100 region(s) from the group file. The chunk is empty [1] "Analysis done! ...
and the output files are empty.
My group file does not contain all variants in the VCF file, and I am wondering if that's the reason?
Thanks again!
Hi, Are the marker IDs in the group file in the format chr:pos:ref:alt? for VCF files, has --chrom been specified to be exactly the same as the chromosome string in the files. Thanks, Wei
Hi, My marker IDs in the previous group file are in the format of chr:pos_ref/alt, which is the same as example file "group_new_chrposa1a2_withWeights.txt". I have changed it into the format of chr:pos:ref:alt. groupfile.txt
For --chrom, I use strings like "chr1,chr2...,chr22", which is consistent with the format of my VCF file.
However, it shows the same warning ("The chunk is empty").
Here are my command lines:
Rscript step2_SPAtests.R \ --vcfFile=./input_mrkh/mrkh_2022Mar.split.hard.AB.VQSR.QDSOR.samFlt.renew.vep.rmRepeat.recode.vcf.GT.gz \ --vcfFileIndex=./input_mrkh/mrkh_2022Mar.split.hard.AB.VQSR.QDSOR.samFlt.renew.vep.rmRepeat.recode.vcf.GT.gz.csi \ --vcfField=GT \ --SAIGEOutputFile=./output_mrkh/mrkh_2022Mar.split.hard.AB.VQSR.QDSOR.samFlt.renew.vep.rmRepeat.recode.vcf_groupTest_out.txt \ --LOCO=FALSE \ --chrom=chr1,chr2,chr3,chr4,chr5,chr6,chr7,chr8,chr9,chr10,chr11,chr12,chr13,chr14,chr15,chr16,chr17,chr18,chr19,chr20,chr21,chr22 \ --minMAF=0 \ --minMAC=0.5 \ --sampleFile=./input_mrkh/mrkh_step2_sample_list.txt\ --GMMATmodelFile=./output_mrkh/mrkh_2022Mar_binary_NULLGLMM.rda \ --sparseGRMFile=output_mrkh/mrkh_2022Mar.sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx \ --sparseGRMSampleIDFile=output_mrkh/mrkh_2022Mar.sparseGRM_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \ --groupFile=./input_mrkh/mrkh_weightfile_colon.txt \ --annotation_in_groupTest=lof,missense:lof,missense:lof:synonymous \ --maxMAF_in_groupTest=0.0001,0.001,0.01
Thanks! Shea
Hi Wei,
While I am still struggling with "The chunk is empty", I found if I do not use "--chrom", other errors will be reported. For example, my VEP annotation in the INFO field is invalid ( I don't know if long INFO annotation will have some bad influence on file reading??). And I am wondering why it does not report the issue with "--chrom" absense?
Thanks! Shea
Hi Shea,
--chrom is only needed for single-variant assoc tests with VCF files are used and --chrom can only take one chromosome string for each job. For set-based tests, the program takes the chromosome information directly from the markers. That is to say, currently, the program assumes all markers in each set are from the same chromosome, which makes the leave-one-chromosome-out feature easier. I'm not quite sure about what causes the error for the INFO field. Since you are using GT in VCF file, can you convert the file to Plink files for jobs? Please use alt-first in the PLINK program.
Thanks, Wei
Hi Shea,
You are right. I'm sorry for the confusion. The format in the file group_new_chrposa1a2_withWeights.txt has not been updated.
Thanks, Wei
Hi Wei,
Thank you so much for your quick response. It really helps a lot!
Now I change vcf file into plink file, and no more errors are reported. Now I think the only problem for me is about the WEIGHT in the group file (also this is what I am most concerned about in my burden analysis)-- When I use group file without weight, the analysis can be done successfully, BUT when I add weight in the group file, it reports:
Start extracting marker-level information from 'groupFile' of ./input_mrkh/mrkh_weightfile_colon_singlesymbol_rmdup_nochr.txt .... [1] "weights are included for markers" indexChunk is 0 indexChunk 0 nRegions 17982 36 markers in 'RegionFile' are not in 'GenoFile'. Read in 100 region(s) from the group file. [1] "Analyzing Region A1BG (1/17982)." Error in mainRegionInCPP(genoType, region$genoIndex_prev, region$genoIndex, : Not compatible with requested type: [type=character; target=double]. Calls: SPAGMMATtest -> SAIGE.Region -> mainRegionInCPP
I am confused about why it happens.
Hope you have a nice weekend!
Thanks, Shea
Hi Wei,
One Addition to my last question :)
I tried to run the example code with and without weight, and found the same error as mine.
When I ran this code without weighted group file (exactly the same as is provided in the tutorial: https://saigegit.github.io//SAIGE-doc/docs/set_step2.html), no error was reported.
Rscript step2_SPAtests.R \ --bedFile=./input/genotype_100markers.bed \ --bimFile=./input/genotype_100markers.bim \ --famFile=./input/genotype_100markers.fam \ --SAIGEOutputFile=./output/genotype_100markers_plink_groupTest_out.txt \ --chrom=1 \ --LOCO=TRUE \ --AlleleOrder=alt-first \ --minMAF=0 \ --minMAC=0.5 \ --sampleFile=./input/samplelist.txt \ --GMMATmodelFile=./output/example_binary_fullGRM.rda \ --varianceRatioFile=./output/example_binary_cate.varianceRatio.txt \ --sparseGRMFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx \ --sparseGRMSampleIDFile=output/sparseGRM_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseGRM.mtx.sampleIDs.txt \ --groupFile=./input/group_new_chrposa1a2.txt \ --annotation_in_groupTest=lof,missense:lof,missense:lof:synonymous \ --maxMAF_in_groupTest=0.0001,0.001,0.01
But when I changed the group file into "group_new_chrposa1a2_withWeights.txt", error was reported! group_new_chrposa1a2_withWeights.txt
Error in mainRegionInCPP(genoType, region$genoIndex_prev, region$genoIndex, : Not compatible with requested type: [type=character; target=double]. Calls: SPAGMMATtest -> SAIGE.Region -> mainRegionInCPP
Full log infomation: log_SetbasedTest_Step2_withWeight.log
Looking forward to your reply :)
Thanks, Shea
Hi Shea,
Thanks for trying the examples. I've fixed the issue in the new version 1.0.7. And the new example script can be found https://saigegit.github.io//SAIGE-doc/docs/set_step2.html (#7).
Btw, please check the additional message about using the sparse GRM in Step 2(WARNING before example #1) at the same doc page https://saigegit.github.io//SAIGE-doc/docs/set_step2.html.
Thanks, Wei
Hi Wei,
Thank you so much for the update!!! This week I have been trying to download your new version, but when I executed "R CMD INSTALL SAIGE", error happened (mainly the URL open error or fork error). I think it is because my Linux server is not stable enough these days. So I want to know if I can download the SAIGE R-package somewhere in my PC and then upload it to Linux server? (ps. unfortunately I cannot use docker because of some authority limit...)
Thanks, Shea
Hi Shea,
I just attached a compressed file as a release v1.0.9. You may directly download the file from https://github.com/saigegit/SAIGE/releases/
using the command wget https://github.com/saigegit/SAIGE/releases/download/v1.0.9/SAIGE_1.0.9.tar.gz
Then run the command to install from the source code R CMD INSTALL SAIGE_1.0.9.tar.gz
Thanks, Wei
Hi Wei,
Thank you! I have downloaded SAIGE v1.0.9 successfully.
When I run step2 test with weighted groupfile using my own data, it reports such error:
...
[1] "Analyzing Region ACOX1 (248/17982)."
[1] "Analyzing Region ACOX2 (249/17982)."
[1] "Analyzing Region ACOX3 (250/17982)."
[1] "Analyzing Region ACOXL (251/17982)."
[1] "Analyzing Region ACP1 (252/17982)."
Error in if (abs(q - m1)/sqrt(var1) < Cutoff) { :
missing value where TRUE/FALSE needed
Calls: SPAGMMATtest ... SAIGE.Region -> get_newPhi_scaleFactor -> <Anonymous>
Execution halted
I found a similar issue here https://github.com/saigegit/SAIGE/issues/12. Also, the places that error arises depend on --annotation_in_groupTest
. When I changed -annotation_in_groupTest=lof,missense:lof,missense:lof:synonymous
into -annotation_in_groupTest=lof
, no error happens.
My group file is attached here: mrkh_weighted_groupfile.txt
Another question is that when I add --r.corr=1
to perform only burden test, error arises:
Error in getopt_options(object, args) : Error in getopt(spec = spec, opt = args) : "–-r.corr=1" is not a valid option, or does not support an argument Calls: parse_args -> parse_options -> getopt_options
Thanks again for your kind attention! Shea
Hi Shea,
Thanks for reporting the issue! I will take a look.
For the burden only test, can you please have a check to see the the Step 2 wrapper file step2_SPAtests.R is the most updated version? https://github.com/saigegit/SAIGE/blob/main/extdata/step2_SPAtests.R
and try the example 4
can you have a check to see if the v1.0.9 is used? The first few lines in the log file contain the information/
Loading required package: RhpcBLASctl
R version 4.1.3 (2022-03-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)
Matrix products: default
BLAS/LAPACK: /stanley/genetics/users/wzhou/.conda/envs/RSAIGE/lib/libopenblasp-r0.3.20.so
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.2 optparse_1.7.1 RhpcBLASctl_0.21-247.1
[4] SAIGE_1.0.9
Thanks, Wei
Hi Wei,
I have followed your suggestion and --r.corr
works normally!
I think maybe something went wrong in my previous command or environment and thus caused the error. Sorry about that!
Moreover, after adding --r.corr=1
, the first error ( Error in if (abs(q - m1)/sqrt(var1) < Cutoff) )did NOT happen anymore when I used --annotation_in_groupTest=lof,missense:lof,missense:lof:synonymous
! So I am wondering if some issue still exist in SKAT-O, just as you mentioned before in https://github.com/saigegit/SAIGE/issues/12?
Thanks, Shea
Hi, has the "chunk is empty" issue been solved? I'm facing the same issue wherein an empty output file is being generated.
The following is displayed in the log repeatedly as each chunk is analyzed, so ultimately a blank file with only the header is getting generated as an output:
x markers in 'RegionFile' are not in 'GenoFile'.
Read in 100 region(s) from the group file.
The chunk is empty
The groupfile is in the right format (chr:pos:ref:alt), so I'm unsure as to where I'm going wrong. It's probably worth mentioning that I'm using the same plink files used as input in step 1 for step 2 as well. Is this where I'm making a mistake?
Thanks, Suraksha
Hi Suraksha, @surakshavinod
I solved this problem after using a newer version of SAIGE. 😂 It seems that the plink files in step 1 are just 1000 random markers for variance ratio estimation, as described here https://saigegit.github.io/SAIGE-doc/docs/set_step1.html. But the plink files in step 2 are the complete files to run burden.
Hope it helps!
Shea
Hi @SheaCheng2000,
Thank you so much for your quick reply! The problem is I'm getting a blank output file both times, i.e., when I'm using the complete plink file for that particular chromosome in both step 1 and step 2, and when I use random markers plink file in step 1 and the complete plink file for that chromosome (say 21) in step 2, so I'm not sure where exactly I'm going wrong.
Thanks, Suraksha
Hi @surakshavinod,
Can you please make sure the IDs in the group file matches the ID column in the PLINK file in step 2? Pleas try the most recent version.
Thanks, Wei
Hi,
I am using set-based test for burden analysis (and I also use the group file with weights).
At step2 this error happened:
It seems that my VCF is not in correct format? But I have checked the VCF file and it seems to be identical with the example. I really don't know what the problem is.
The log file and my VCF file example are provided here.
step2_input.vcf.txt step2.log
Looking forward for your reply!