weizhouUMICH / SAIGE

GNU Lesser General Public License v3.0
188 stars 73 forks source link

failed Step2 run: address 0x194cc9d8, cause 'memory not mapped' #56

Closed ruthchia closed 4 years ago

ruthchia commented 5 years ago

Hi, I ran step2 for ukbb sample set but it failed with the error below: Can you please help me figure out what is wrong and why it failed? I allocated 64g memory to the run. Thanks - ruth

caught segfault address 0x194cc9d8, cause 'memory not mapped'

Traceback: 1: getDosage_bgen_noquery() 2: SPAGMMATtest(dosageFile = opt$dosageFile, dosageFileNrowSkip = opt$dosageFileNrowSkip, dosageFileNcolSkip = opt$dosageFileNcolSkip, dosageFilecolnamesSkip = dosageFilecolnames, vcfFile = opt$vcfFile, vcfFileIndex = opt$vcfFileIndex, vcfField = opt$vcfField, bgenFile = opt$bgenFile, bgenFileIndex = opt$bgenFileIndex, savFile = opt$savFile, savFileIndex = opt$savFileIndex, chrom = opt$chrom, start = opt$start, end = opt$end, sampleFile = opt$sampleFile, GMMATmodelFile = opt$GMMATmodelFile, varianceRatioFile = opt$varianceRatioFile, SAIGEOutputFile = opt$SAIGEOutputFile, minMAF = opt$minMAF, minMAC = opt$minMAC, numLinesOutput = opt$numLinesOutput, IsOutputAFinCaseCtrl = opt$IsOutputAFinCaseCtrl, LOCO = opt$LOCO) An irrecoverable exception occurred. R is aborting now ...

FYI - Here's a log of what was run: ---- COMMAND EXECUTED: --------------------------------------------------------- Rscript /data/chiarp/MG_GWAS/SAIGE/extdata/step2_SPAtests.R --bgenFile=/data/LNG/UKBIOBANK/EGAD00010001474/ukb_imp_chr1_v3.bgen --bgenFileIndex=/data/LNG/UKBIOBANK/EGAD00010001474/ukb_imp_chr1_v3.bgen.bgi --minMAF=0.0001 --minMAC=4 --chrom=1 --sampleFile=sampleIID_dosageBGEN.txt --GMMATmodelFile=./output1B/UKBB_mg_keepRelated.rda --varianceRatioFile=./output1B/UKBB_mg_keepRelated.varianceRatio.txt --SAIGEOutputFile=./output1B/UKBB_mg_keepRelated_chr1.SAIGE.bgen.txt --numLinesOutput=2 --IsOutputAFinCaseCtrl=TRUE

$dosageFile [1] ""

$dosageFileNrowSkip [1] 0

$dosageFileNcolSkip [1] 5

$dosageFilecolnamesSkip [1] ""

$vcfFile [1] ""

$vcfFileIndex [1] ""

$vcfField [1] "DS"

$bgenFile [1] "/data/LNG/UKBIOBANK/EGAD00010001474/ukb_imp_chr1_v3.bgen"

$bgenFileIndex [1] "/data/LNG/UKBIOBANK/EGAD00010001474/ukb_imp_chr1_v3.bgen.bgi"

$savFile [1] ""

$savFileIndex [1] ""

$chrom [1] "1"

$start [1] 1

$end [1] 2.5e+08

$minMAF [1] 1e-04

$minMAC [1] 4

$sampleFile [1] "sampleIID_dosageBGEN.txt"

$GMMATmodelFile [1] "./output1B/UKBB_mg_keepRelated.rda"

$varianceRatioFile [1] "./output1B/UKBB_mg_keepRelated.varianceRatio.txt"

$SAIGEOutputFile [1] "./output1B/UKBB_mg_keepRelated_chr1.SAIGE.bgen.txt"

$numLinesOutput [1] 2

$IsOutputAFinCaseCtrl [1] TRUE

$LOCO [1] FALSE

$help [1] FALSE

408961 samples have been used to fit the glmm null model obj.glmm.null$LOCO: FALSE Leave-one-chromosome-out option is not applied variance Ratio is 0.992 408962 sample IDs are found in sample file [1] 408962 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 408961 samples were used in fitting the NULL glmm model and are found in sample file minMAC: 4 minMAF: 1e-04 Minimum MAF of markers to be testd is 1e-04 Analysis started at 1.54e+09 Seconds no query list is provided 487409 samples are found in the bgen file 7402791 markers are found in the bgen file isVariant: TRUE It is a binary trait Analyzing 240 cases and 408721 controls

sgagliano commented 5 years ago

Hi Ruth,

Please check that all the individuals in your step 2 bgen files are also found in your plink files from step 1. If there are individuals in step 2 that are not present in step 1, SAIGE will throw an error. (Note that it is fine if not everyone is listed in the phenotype file.)

Let us know if that solves the issue. Thanks.

Sarah

khatch88 commented 5 years ago

Hi sorry I know this thread hasn't been used for a while, but I also had a segmentation fault, but not quite the same. I tried the fix here just in case, but it did not help.

Here is the command I used:

SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_sgroup-based test will be performedVOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.t300 samples have been used to fit the glmm null modelile = "VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_obj.glmm.null$LOCO: FALSE d.sparseSigma.mtx")

And the error I got:

Leave-one-chromosome-out option is not applied variance Ratio is 1 1 1 1 1 1 1 1 1416 sample IDs are found in sample file [1] 1416 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 300 samples were used in fitting the NULL glmm model and are found in sample file sparse kinship matrix is going to be used sparseSigmaFile: VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx Missing dosages will be mean imputed for the analysis Analysis started at 1561579006 Seconds It is a quantitative trait minMAC: 0.5 minMAF: 0 Minimum MAF of markers to be tested is 0.0008333333 Analysis started at 1561579006 Seconds genetic variants with 0.0008333333 <= MAF <= 0.5 are included for gene-based tests Open VCF done To read the field DS Number of meta lines in the vcf file (lines starting with ##): 14 Number of samples in the vcf file: 1416 It is a quantitative trait Gene Pvalue Nmarker_MACCate_1 Nmarker_MACCate_2 Nmarker_MACCate_3 Nmarker_MACCate_4 Nmarker_MACCate_5 Nmarker_MACCate_6 Nmarker_MACCate_7 Nmarker_MACCate_8 markerIDs markerAFs Pvalue_Burden Pvalue_SKAT geneID: SNP

caught segfault address 0x4c, cause 'memory not mapped'

Traceback: 1: getGenoOfGene_vcf(marker_group_line, minInfo) 2: SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_samplelist.txt", GMMATmodelFile = "VOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.txt", groupFile = "groupfile_ACPchr1.vcf", sparseSigmaFile = "VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx")

Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

Thanks for your help! Katie

weizhouUMICH commented 5 years ago

Hi Katie,

Are the markers in the group file ordered by position?

Thanks, Wei

On Wed, Jun 26, 2019 at 4:04 PM khatch88 notifications@github.com wrote:

Hi sorry I know this thread hasn't been used for a while, but I also had a segmentation fault, but not quite the same. I tried the fix here just in case, but it did not help.

Here is the command I used:

SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_sgroup-based test will be performedVOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.t300 samples have been used to fit the glmm null modelile = "VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_obj.glmm.null$LOCO: FALSE d.sparseSigma.mtx")

And the error I got:

Leave-one-chromosome-out option is not applied variance Ratio is 1 1 1 1 1 1 1 1 1416 sample IDs are found in sample file [1] 1416 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 300 samples were used in fitting the NULL glmm model and are found in sample file sparse kinship matrix is going to be used sparseSigmaFile: VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx Missing dosages will be mean imputed for the analysis Analysis started at 1561579006 Seconds It is a quantitative trait minMAC: 0.5 minMAF: 0 Minimum MAF of markers to be tested is 0.0008333333 Analysis started at 1561579006 Seconds genetic variants with 0.0008333333 <= MAF <= 0.5 are included for gene-based tests Open VCF done To read the field DS Number of meta lines in the vcf file (lines starting with ##): 14 Number of samples in the vcf file: 1416 It is a quantitative trait Gene Pvalue Nmarker_MACCate_1 Nmarker_MACCate_2 Nmarker_MACCate_3 Nmarker_MACCate_4 Nmarker_MACCate_5 Nmarker_MACCate_6 Nmarker_MACCate_7 Nmarker_MACCate_8 markerIDs markerAFs Pvalue_Burden Pvalue_SKAT geneID: SNP

caught segfault address 0x4c, cause 'memory not mapped'

Traceback: 1: getGenoOfGene_vcf(marker_group_line, minInfo) 2: SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_samplelist.txt", GMMATmodelFile = "VOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.txt", groupFile = "groupfile_ACPchr1.vcf", sparseSigmaFile = "VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx")

Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

Thanks for your help! Katie

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/weizhouUMICH/SAIGE/issues/56?email_source=notifications&email_token=ACL52L7UFEWBZICDBS2ILK3P4PDUNA5CNFSM4FZCYYSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUVEJY#issuecomment-506024487, or mute the thread https://github.com/notifications/unsubscribe-auth/ACL52L5GSEFHU7CQMPZSVBLP4PDUNANCNFSM4FZCYYSA .

khatch88 commented 5 years ago

Hi Wei,

I think you are right that the problem is with the group file! Thank you! I have been having issues with that file when I was trying to figure out how to format it and have a couple questions pertaining to it. I know from the wiki page that each line is a gene or variant set, but is groupFile_geneBasedtest.txt the example group file for sav files and groupFile_geneBasedtest_simulation.txt the example file for vcf files? Also, I thought that I had formatted the group file correctly for this, but apparently not, so is there a good way to create a group file from files I already have (I have been using vcfs of imputed data for my input files).

Thanks so much; I'm still very new to this! Katie

On Wed, Jun 26, 2019 at 4:10 PM weizhouUMICH notifications@github.com wrote:

Hi Katie,

Are the markers in the group file ordered by position?

Thanks, Wei

On Wed, Jun 26, 2019 at 4:04 PM khatch88 notifications@github.com wrote:

Hi sorry I know this thread hasn't been used for a while, but I also had a segmentation fault, but not quite the same. I tried the fix here just in case, but it did not help.

Here is the command I used:

SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_sgroup-based test will be performedVOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.t300 samples have been used to fit the glmm null modelile =

"VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_obj.glmm.null$LOCO: FALSE d.sparseSigma.mtx")

And the error I got:

Leave-one-chromosome-out option is not applied variance Ratio is 1 1 1 1 1 1 1 1 1416 sample IDs are found in sample file [1] 1416 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 300 samples were used in fitting the NULL glmm model and are found in sample file sparse kinship matrix is going to be used sparseSigmaFile:

VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx Missing dosages will be mean imputed for the analysis Analysis started at 1561579006 Seconds It is a quantitative trait minMAC: 0.5 minMAF: 0 Minimum MAF of markers to be tested is 0.0008333333 Analysis started at 1561579006 Seconds genetic variants with 0.0008333333 <= MAF <= 0.5 are included for gene-based tests Open VCF done To read the field DS Number of meta lines in the vcf file (lines starting with ##): 14 Number of samples in the vcf file: 1416 It is a quantitative trait Gene Pvalue Nmarker_MACCate_1 Nmarker_MACCate_2 Nmarker_MACCate_3 Nmarker_MACCate_4 Nmarker_MACCate_5 Nmarker_MACCate_6 Nmarker_MACCate_7 Nmarker_MACCate_8 markerIDs markerAFs Pvalue_Burden Pvalue_SKAT geneID: SNP

caught segfault address 0x4c, cause 'memory not mapped'

Traceback: 1: getGenoOfGene_vcf(marker_group_line, minInfo) 2: SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_samplelist.txt", GMMATmodelFile = "VOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.txt", groupFile = "groupfile_ACPchr1.vcf", sparseSigmaFile =

"VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx")

Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

Thanks for your help! Katie

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/weizhouUMICH/SAIGE/issues/56?email_source=notifications&email_token=ACL52L7UFEWBZICDBS2ILK3P4PDUNA5CNFSM4FZCYYSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUVEJY#issuecomment-506024487 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ACL52L5GSEFHU7CQMPZSVBLP4PDUNANCNFSM4FZCYYSA

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/weizhouUMICH/SAIGE/issues/56?email_source=notifications&email_token=ALOOPCLWI4JSSYTSYJP3HRLP4PELDA5CNFSM4FZCYYSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUVTIQ#issuecomment-506026402, or mute the thread https://github.com/notifications/unsubscribe-auth/ALOOPCLYOHUGASMZQ4CSTVDP4PELDANCNFSM4FZCYYSA .

khatch88 commented 5 years ago

Hello, I am still getting that caught "segfault address 0x4c, cause 'memory not mapped'" error, but I do think that the problem is my group file. I have been having trouble with this file and the closest I have come to creating one using the set-test option with plink but it lacks the reference/alternate allele. Does anyone have any tips on creating a correct group file to use with a vcf for the SPAGMMATtest using SAIGE-GENE? Thanks! Katie

weizhouUMICH commented 4 years ago

Hi @khatch88, Have you sorted out this problem? Sorry for my really late reply.

Here are more demonstration for the formats for group files https://github.com/weizhouUMICH/SAIGE/blob/master/extdata/cmd.sh

Perform gene-based/region-based tests according to the group file specified in groupFile

use --sparseSigmaFile to specify sparse Sigma file generated in step 1. Note: not sparse GRM

IsSingleVarinGroupTest=TRUE is to perform single-variant assoc tests as well for markers included in the gene-based tests

only vcf, sav, and bgen dosage file formats can be used for gene-based tests

to perform gene-based tests, --groupFile is used to specify a group file

Each line is for one gene/set of variants. The first element is for gene/set name. The rest of the line is for variant ids included in this gene/set. For vcf/sav, the genetic marker ids are in the format chr:pos_ref/alt. For bgen, the genetic marker ids should match the rs ids in the bgen file. Each element in the line is separated by tab.

I'm closing this issue and please reopen if you still encounter this problem. Thanks! Wei

jameskozubek commented 4 years ago

I had the same error. I corrected it by reset the tab on my group file. awk '{$1=$1}1' OFS="\t" and make sure snps in order, also snps must be assigned to correct genes.

benjacobs123456 commented 3 years ago

Hi @weizhouUMICH thanks for this amazing package - I've run into this same problem. Might be missing something very obvious!

I'm trying to run SAIGE-Gene (step 2):

Rscript /data/Wolfson-UKBB-Dobson/SAIGE/extdata/step2_SPAtests.R \
        --vcfFile=plof_rvtests_vcf_input_ms_chr${SGE_TASK_ID}\.vcf.gz \
        --vcfFileIndex=plof_rvtests_vcf_input_ms_chr${SGE_TASK_ID}\.vcf.gz.tbi \
        --vcfField=GT \
        --chrom=${SGE_TASK_ID} \
        --minMAF=0 \
        --minMAC=0.5 \
        --GMMATmodelFile=saige_gene.rda \
        --varianceRatioFile=saige_gene_var_ratio.varianceRatio.txt \
        --SAIGEOutputFile=SAIGE_gene_output_chrom${SGE_TASK_ID}\.vcf.genotype.txt \
        --numLinesOutput=2 \
        --IsOutputAFinCaseCtrl=TRUE \
        --sparseSigmaFile=saige_gene_var_ratio.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx \
        --groupFile=gene_set_snps_chr${SGE_TASK_ID}

The header of my vcf input file looks like this:


(RSAIGE) [hmy117@frontend8 hmy117]$ zcat plof_rvtests_vcf_input_ms_chr21.vcf.gz | cut -f1-10 | head -n20
##fileformat=VCFv4.2
##fileDate=20201126
##source=PLINKv1.90
##contig=<ID=21,length=46664475>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  3434888
21      10413704        21:10413704:D:20        GGGCTGGAGCAGTAAGATGGC   G       .       .       PR      GT      0/0
21      10413708        21:10413708:D:20        TGGAGCAGTAAGATGGCGGCC   T       .       .       PR      GT      0/0
21      10552667        21:10552667:C:T C       T       .       .       PR      GT      0/0
21      10561162        21:10561162:G:A G       A       .       .       PR      GT      0/0
21      10561175        21:10561175:C:T C       T       .       .       PR      GT      0/0
21      10567752        21:10567752:I:1 A       AT      .       .       PR      GT      0/0
21      10569462        21:10569462:C:T C       T       .       .       PR      GT      0/0
21      10569739        21:10569739:C:G C       G       .       .       PR      GT      0/0
21      10577490        21:10577490:C:T C       T       .       .       PR      GT      0/0
21      10578341        21:10578341:D:2 AAG     A       .       .       PR      GT      0/0
21      10592320        21:10592320:C:T C       T       .       .       PR      GT      0/0
21      10598019        21:10598019:T:G T       G       .       .       PR      GT      0/0
21      10602061        21:10602061:D:1 CT      C       .       .       PR      GT      0/0

And the had of my group file looks like this

(RSAIGE) [hmy117@frontend8 hmy117]$ head test_chr22 | cut -f1-5
THOC5   chr22:29536722_G/A      chr22:29539334_T/A      chr22:29549069_G/A      chr22:29549146_A/C
ASCC2   chr22:29789081_TC/T     chr22:29804782_C/T      chr22:29806886_C/T      chr22:29814666_C/T
PANX2   chr22:50170812_GGC/G    chr22:50177318_GA/G     chr22:50177343_G/T      chr22:50177715_CA/C
HPS4    chr22:26453271_T/A      chr22:26453352_AAT/A    chr22:26453383_G/T      chr22:26458554_C/CA
TANGO2  chr22:20036800_T/A      chr22:20036800_T/C      chr22:20036810_CTTCT/C  chr22:20036928_CTGCTGTG/C
PLXNB2  chr22:50275890_TC/T     chr22:50275895_A/T      chr22:50276678_ACCATCTG/A       chr22:50278471_GC/G
DEPDC5  chr22:31754969_TG/T     chr22:31783940_ATATT/A  chr22:31815005_CGA/C    chr22:31819054_C/T
SGSM1   chr22:24850321_T/TGCCG  chr22:24855356_G/T      chr22:24855370_GCA/G    chr22:24855641_C/CA
MLC1    chr22:50064137_TTG/T    chr22:50070543_C/CA     chr22:50074290_TC/T     chr22:50080005_CT/C
MLC1    chr22:50064137_TTG/T    chr22:50070543_C/CA     chr22:50074290_TC/T     chr22:50080005_CT/C

But I keep getting an error message saying that there are no samples left after removing missing samples. This isn't true, so I'm sure it must be a problem with the formatting of the group file

[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  XKR3
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  CBX7
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  PISD
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  L3MBTL2-AS1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  FAM118A
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  DEPDC5
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  RAB36
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  TOP3B
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  APOBEC3G
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  APOBEC3H
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  TRMT2A
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SYNGR1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  RGL4
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  TMPRSS6
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  TAB1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  RPS19BP1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  CDC45
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  PRAME
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  PRAME
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  PRAME
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  PRAME
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  BID
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  BID
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  TRIOBP
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  DERL3
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  GGTLC2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  PNPLA5
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  IL17REL
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  IL17REL
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  BID
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  YDJC
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NFAM1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  CDC45
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  CHCHD10
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  CHCHD10
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SLC2A11
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  C22orf39
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  C22orf39
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  RAB36
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  DDT
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SPECC1L
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SPECC1L
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  RBFOX2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  RBFOX2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  RBFOX2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  NCF4
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MPST
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  FAM227A
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  FAM227A
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  MCHR1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  CCDC134
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  CCDC134
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  CCDC134
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  ARHGAP8
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  KIAA0930
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  KIAA0930
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  TTC38
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  ODF3B
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  ODF3B
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID:  ODF3B
std::size_t sample_size = marker_file.samples().size();126477
cntMarker:  0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
closed the genofile!
Analysis ended at  1607092298 Seconds
Analysis took  157.3983 Seconds

Here is the start of the output (where everything seems fine). NB I've tried with and out without --IsDropMissingDosages and no sucess:

**
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /data/home/hmy117/.conda/envs/RSAIGE/lib/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] SAIGE_0.39.3

loaded via a namespace (and not attached):
[1] compiler_3.6.1     Matrix_1.2-17      Rcpp_1.0.1         grid_3.6.1
[5] RcppParallel_4.4.4 lattice_0.20-38
$vcfFile
[1] "plof_rvtests_vcf_input_ms_chr22.vcf.gz"

$vcfFileIndex
[1] "plof_rvtests_vcf_input_ms_chr22.vcf.gz.tbi"

$vcfField
[1] "GT"

$bgenFile
[1] ""

$bgenFileIndex
[1] ""

$savFile
[1] ""

$savFileIndex
[1] ""

$idstoExcludeFile
[1] ""

$idstoIncludeFile
[1] ""

$rangestoExcludeFile
[1] ""

$rangestoIncludeFile
[1] ""

$chrom
[1] "22"

$start
[1] 1

$end
[1] 2.5e+08

$IsDropMissingDosages
[1] FALSE

$minMAF
[1] 0

$minMAC
[1] 0.5

$maxMAFforGroupTest
[1] 0.5

$minInfo
[1] 0

$sampleFile
[1] ""

$GMMATmodelFile
[1] "saige_gene.rda"

$varianceRatioFile
[1] "saige_gene_var_ratio.varianceRatio.txt"

$SAIGEOutputFile
[1] "SAIGE_gene_output_chrom22.vcf.genotype.txt"

$numLinesOutput
[1] 2

$IsSparse
[1] TRUE

$SPAcutoff
[1] 2

$IsOutputAFinCaseCtrl
[1] TRUE

$IsOutputNinCaseCtrl
[1] FALSE

$IsOutputHetHomCountsinCaseCtrl
[1] FALSE

$LOCO
[1] FALSE

$condition
[1] ""

$sparseSigmaFile
[1] "saige_gene_var_ratio.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx"

$groupFile
[1] "test_chr22"

$kernel
[1] "linear.weighted"

$method
[1] "optimal.adj"

$weights.beta.rare
[1] "1,25"

$weights.beta.common
[1] "1,25"

$weightMAFcutoff
[1] 0.01

$r.corr
[1] "0"

$IsSingleVarinGroupTest
[1] FALSE

$cateVarRatioMinMACVecExclude
[1] "0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5"

$cateVarRatioMaxMACVecInclude
[1] "1.5,2.5,3.5,4.5,5.5,10.5,20.5"

$dosageZerodCutoff
[1] 0.2

$IsOutputPvalueNAinGroupTestforBinary
[1] FALSE

$IsAccountforCasecontrolImbalanceinGroupTest
[1] TRUE

$weightsIncludeinGroupFile
[1] FALSE

$IsOutputBETASEinBurdenTest
[1] FALSE

$sampleFile_male
[1] ""

$X_PARregion
[1] ""

$is_rewrite_XnonPAR_forMales
[1] FALSE

$help
[1] FALSE

weights.beta.rare  is  1 25
weights.beta.common  is  1 25
cateVarRatioMinMACVecExclude  is  0.5 1.5 2.5 3.5 4.5 5.5 10.5 20.5
cateVarRatioMaxMACVecInclude  is  1.5 2.5 3.5 4.5 5.5 10.5 20.5
group-based test will be performed
Any dosages <=  0.2  for genetic variants with MAC <= 10 are set to be 0 in group tests
126477  samples have been used to fit the glmm null model
obj.glmm.null$LOCO:  FALSE
Leave-one-chromosome-out option is not applied
variance Ratio is  1.000006 1.000007 1.000009 1.000014 1.000009 0.9999915 0.999993 0.9973885
isCondition is  FALSE
Open VCF done
To read the field GT
Number of meta lines in the vcf file (lines starting with ##): 7
Number of samples in the vcf file: 126477
126477  sample IDs are found in the vcf file
[1] 126477      4
[1] "IID"          "IndexInModel" "IndexDose.x"  "IndexDose.y"
126477  samples were used in fitting the NULL glmm model and are found in sample file
sparse kinship matrix is going to be used
sparseSigmaFile:  saige_gene_var_ratio.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx
Missing dosages will be mean imputed for the analysis
Analysis started at  1607092389 Seconds
minMAC:  0.5
minMAF:  0
Minimum MAF of markers to be tested is  1.976644e-06
It is a binary trait
Analyzing  623  cases and  125854  controls
isCondition is  FALSE
Analysis started at  1607092389 Seconds
genetic variants with  1.976644e-06 <= MAF <=  0.5 are included for gene-based tests
It is a binary trait
Case-control imbalance is adjusted for binary traits.

I did consider whether the IDs need to reflect the PLINK IDs (VCFs made via PLINK), but that also didn't work.

Can you see where I'm going wrong? If possible would be amazing to be able to download a reference 'groupfile' containing all variants if that's not too unwieldly.

Thanks Ben

jameskozubek commented 3 years ago

I suspect the issue is with the group file. I believe that it might work if you remove the "chr" and change chr22:29536722_G/A to 22:29536722_G/A Also make sure the tabs are set correctly.

benjacobs123456 commented 3 years ago

Great catch @jameskozubek, thanks so much! You were right it was the 'chr' prefix. Got rid of it and it's working a treat.

Cheers Ben