Closed ruthchia closed 4 years ago
Hi Ruth,
Please check that all the individuals in your step 2 bgen files are also found in your plink files from step 1. If there are individuals in step 2 that are not present in step 1, SAIGE will throw an error. (Note that it is fine if not everyone is listed in the phenotype file.)
Let us know if that solves the issue. Thanks.
Sarah
Hi sorry I know this thread hasn't been used for a while, but I also had a segmentation fault, but not quite the same. I tried the fix here just in case, but it did not help.
Here is the command I used:
SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_sgroup-based test will be performedVOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.t300 samples have been used to fit the glmm null modelile = "VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_obj.glmm.null$LOCO: FALSE d.sparseSigma.mtx")
And the error I got:
Leave-one-chromosome-out option is not applied variance Ratio is 1 1 1 1 1 1 1 1 1416 sample IDs are found in sample file [1] 1416 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 300 samples were used in fitting the NULL glmm model and are found in sample file sparse kinship matrix is going to be used sparseSigmaFile: VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx Missing dosages will be mean imputed for the analysis Analysis started at 1561579006 Seconds It is a quantitative trait minMAC: 0.5 minMAF: 0 Minimum MAF of markers to be tested is 0.0008333333 Analysis started at 1561579006 Seconds genetic variants with 0.0008333333 <= MAF <= 0.5 are included for gene-based tests Open VCF done To read the field DS Number of meta lines in the vcf file (lines starting with ##): 14 Number of samples in the vcf file: 1416 It is a quantitative trait Gene Pvalue Nmarker_MACCate_1 Nmarker_MACCate_2 Nmarker_MACCate_3 Nmarker_MACCate_4 Nmarker_MACCate_5 Nmarker_MACCate_6 Nmarker_MACCate_7 Nmarker_MACCate_8 markerIDs markerAFs Pvalue_Burden Pvalue_SKAT geneID: SNP
caught segfault address 0x4c, cause 'memory not mapped'
Traceback: 1: getGenoOfGene_vcf(marker_group_line, minInfo) 2: SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_samplelist.txt", GMMATmodelFile = "VOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.txt", groupFile = "groupfile_ACPchr1.vcf", sparseSigmaFile = "VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx")
Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace
Thanks for your help! Katie
Hi Katie,
Are the markers in the group file ordered by position?
Thanks, Wei
On Wed, Jun 26, 2019 at 4:04 PM khatch88 notifications@github.com wrote:
Hi sorry I know this thread hasn't been used for a while, but I also had a segmentation fault, but not quite the same. I tried the fix here just in case, but it did not help.
Here is the command I used:
SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_sgroup-based test will be performedVOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.t300 samples have been used to fit the glmm null modelile = "VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_obj.glmm.null$LOCO: FALSE d.sparseSigma.mtx")
And the error I got:
Leave-one-chromosome-out option is not applied variance Ratio is 1 1 1 1 1 1 1 1 1416 sample IDs are found in sample file [1] 1416 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 300 samples were used in fitting the NULL glmm model and are found in sample file sparse kinship matrix is going to be used sparseSigmaFile: VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx Missing dosages will be mean imputed for the analysis Analysis started at 1561579006 Seconds It is a quantitative trait minMAC: 0.5 minMAF: 0 Minimum MAF of markers to be tested is 0.0008333333 Analysis started at 1561579006 Seconds genetic variants with 0.0008333333 <= MAF <= 0.5 are included for gene-based tests Open VCF done To read the field DS Number of meta lines in the vcf file (lines starting with ##): 14 Number of samples in the vcf file: 1416 It is a quantitative trait Gene Pvalue Nmarker_MACCate_1 Nmarker_MACCate_2 Nmarker_MACCate_3 Nmarker_MACCate_4 Nmarker_MACCate_5 Nmarker_MACCate_6 Nmarker_MACCate_7 Nmarker_MACCate_8 markerIDs markerAFs Pvalue_Burden Pvalue_SKAT geneID: SNP
caught segfault address 0x4c, cause 'memory not mapped'
Traceback: 1: getGenoOfGene_vcf(marker_group_line, minInfo) 2: SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_samplelist.txt", GMMATmodelFile = "VOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.txt", groupFile = "groupfile_ACPchr1.vcf", sparseSigmaFile = "VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx")
Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace
Thanks for your help! Katie
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/weizhouUMICH/SAIGE/issues/56?email_source=notifications&email_token=ACL52L7UFEWBZICDBS2ILK3P4PDUNA5CNFSM4FZCYYSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUVEJY#issuecomment-506024487, or mute the thread https://github.com/notifications/unsubscribe-auth/ACL52L5GSEFHU7CQMPZSVBLP4PDUNANCNFSM4FZCYYSA .
Hi Wei,
I think you are right that the problem is with the group file! Thank you! I have been having issues with that file when I was trying to figure out how to format it and have a couple questions pertaining to it. I know from the wiki page that each line is a gene or variant set, but is groupFile_geneBasedtest.txt the example group file for sav files and groupFile_geneBasedtest_simulation.txt the example file for vcf files? Also, I thought that I had formatted the group file correctly for this, but apparently not, so is there a good way to create a group file from files I already have (I have been using vcfs of imputed data for my input files).
Thanks so much; I'm still very new to this! Katie
On Wed, Jun 26, 2019 at 4:10 PM weizhouUMICH notifications@github.com wrote:
Hi Katie,
Are the markers in the group file ordered by position?
Thanks, Wei
On Wed, Jun 26, 2019 at 4:04 PM khatch88 notifications@github.com wrote:
Hi sorry I know this thread hasn't been used for a while, but I also had a segmentation fault, but not quite the same. I tried the fix here just in case, but it did not help.
Here is the command I used:
SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_sgroup-based test will be performedVOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.t300 samples have been used to fit the glmm null modelile =
"VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_obj.glmm.null$LOCO: FALSE d.sparseSigma.mtx")
And the error I got:
Leave-one-chromosome-out option is not applied variance Ratio is 1 1 1 1 1 1 1 1 1416 sample IDs are found in sample file [1] 1416 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 300 samples were used in fitting the NULL glmm model and are found in sample file sparse kinship matrix is going to be used sparseSigmaFile:
VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx Missing dosages will be mean imputed for the analysis Analysis started at 1561579006 Seconds It is a quantitative trait minMAC: 0.5 minMAF: 0 Minimum MAF of markers to be tested is 0.0008333333 Analysis started at 1561579006 Seconds genetic variants with 0.0008333333 <= MAF <= 0.5 are included for gene-based tests Open VCF done To read the field DS Number of meta lines in the vcf file (lines starting with ##): 14 Number of samples in the vcf file: 1416 It is a quantitative trait Gene Pvalue Nmarker_MACCate_1 Nmarker_MACCate_2 Nmarker_MACCate_3 Nmarker_MACCate_4 Nmarker_MACCate_5 Nmarker_MACCate_6 Nmarker_MACCate_7 Nmarker_MACCate_8 markerIDs markerAFs Pvalue_Burden Pvalue_SKAT geneID: SNP
caught segfault address 0x4c, cause 'memory not mapped'
Traceback: 1: getGenoOfGene_vcf(marker_group_line, minInfo) 2: SPAGMMATtest(vcfFile = "chr1.dose.vcf.gz", vcfFileIndex = "chr1.dose.vcf.gz.tbi", chrom = 1, sampleFile = "chr1_samplelist.txt", GMMATmodelFile = "VOXEL_31_106_79.cate.rda", varianceRatioFile = "VOXEL_31_106_79.cate.varianceRatio.txt", groupFile = "groupfile_ACPchr1.vcf", sparseSigmaFile =
"VOXEL_31_106_79.cate.varianceRatio.txt_relatednessCutoff_0.125_1000_randomMarkersUsed.sparseSigma.mtx")
Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace
Thanks for your help! Katie
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/weizhouUMICH/SAIGE/issues/56?email_source=notifications&email_token=ACL52L7UFEWBZICDBS2ILK3P4PDUNA5CNFSM4FZCYYSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUVEJY#issuecomment-506024487 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ACL52L5GSEFHU7CQMPZSVBLP4PDUNANCNFSM4FZCYYSA
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/weizhouUMICH/SAIGE/issues/56?email_source=notifications&email_token=ALOOPCLWI4JSSYTSYJP3HRLP4PELDA5CNFSM4FZCYYSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYUVTIQ#issuecomment-506026402, or mute the thread https://github.com/notifications/unsubscribe-auth/ALOOPCLYOHUGASMZQ4CSTVDP4PELDANCNFSM4FZCYYSA .
Hello, I am still getting that caught "segfault address 0x4c, cause 'memory not mapped'" error, but I do think that the problem is my group file. I have been having trouble with this file and the closest I have come to creating one using the set-test option with plink but it lacks the reference/alternate allele. Does anyone have any tips on creating a correct group file to use with a vcf for the SPAGMMATtest using SAIGE-GENE? Thanks! Katie
Hi @khatch88, Have you sorted out this problem? Sorry for my really late reply.
Here are more demonstration for the formats for group files https://github.com/weizhouUMICH/SAIGE/blob/master/extdata/cmd.sh
Each line is for one gene/set of variants. The first element is for gene/set name. The rest of the line is for variant ids included in this gene/set. For vcf/sav, the genetic marker ids are in the format chr:pos_ref/alt. For bgen, the genetic marker ids should match the rs ids in the bgen file. Each element in the line is separated by tab.
I'm closing this issue and please reopen if you still encounter this problem. Thanks! Wei
I had the same error. I corrected it by reset the tab on my group file. awk '{$1=$1}1' OFS="\t" and make sure snps in order, also snps must be assigned to correct genes.
Hi @weizhouUMICH thanks for this amazing package - I've run into this same problem. Might be missing something very obvious!
I'm trying to run SAIGE-Gene (step 2):
Rscript /data/Wolfson-UKBB-Dobson/SAIGE/extdata/step2_SPAtests.R \
--vcfFile=plof_rvtests_vcf_input_ms_chr${SGE_TASK_ID}\.vcf.gz \
--vcfFileIndex=plof_rvtests_vcf_input_ms_chr${SGE_TASK_ID}\.vcf.gz.tbi \
--vcfField=GT \
--chrom=${SGE_TASK_ID} \
--minMAF=0 \
--minMAC=0.5 \
--GMMATmodelFile=saige_gene.rda \
--varianceRatioFile=saige_gene_var_ratio.varianceRatio.txt \
--SAIGEOutputFile=SAIGE_gene_output_chrom${SGE_TASK_ID}\.vcf.genotype.txt \
--numLinesOutput=2 \
--IsOutputAFinCaseCtrl=TRUE \
--sparseSigmaFile=saige_gene_var_ratio.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx \
--groupFile=gene_set_snps_chr${SGE_TASK_ID}
The header of my vcf input file looks like this:
(RSAIGE) [hmy117@frontend8 hmy117]$ zcat plof_rvtests_vcf_input_ms_chr21.vcf.gz | cut -f1-10 | head -n20
##fileformat=VCFv4.2
##fileDate=20201126
##source=PLINKv1.90
##contig=<ID=21,length=46664475>
##INFO=<ID=PR,Number=0,Type=Flag,Description="Provisional reference allele, may not be based on real reference genome">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 3434888
21 10413704 21:10413704:D:20 GGGCTGGAGCAGTAAGATGGC G . . PR GT 0/0
21 10413708 21:10413708:D:20 TGGAGCAGTAAGATGGCGGCC T . . PR GT 0/0
21 10552667 21:10552667:C:T C T . . PR GT 0/0
21 10561162 21:10561162:G:A G A . . PR GT 0/0
21 10561175 21:10561175:C:T C T . . PR GT 0/0
21 10567752 21:10567752:I:1 A AT . . PR GT 0/0
21 10569462 21:10569462:C:T C T . . PR GT 0/0
21 10569739 21:10569739:C:G C G . . PR GT 0/0
21 10577490 21:10577490:C:T C T . . PR GT 0/0
21 10578341 21:10578341:D:2 AAG A . . PR GT 0/0
21 10592320 21:10592320:C:T C T . . PR GT 0/0
21 10598019 21:10598019:T:G T G . . PR GT 0/0
21 10602061 21:10602061:D:1 CT C . . PR GT 0/0
And the had of my group file looks like this
(RSAIGE) [hmy117@frontend8 hmy117]$ head test_chr22 | cut -f1-5
THOC5 chr22:29536722_G/A chr22:29539334_T/A chr22:29549069_G/A chr22:29549146_A/C
ASCC2 chr22:29789081_TC/T chr22:29804782_C/T chr22:29806886_C/T chr22:29814666_C/T
PANX2 chr22:50170812_GGC/G chr22:50177318_GA/G chr22:50177343_G/T chr22:50177715_CA/C
HPS4 chr22:26453271_T/A chr22:26453352_AAT/A chr22:26453383_G/T chr22:26458554_C/CA
TANGO2 chr22:20036800_T/A chr22:20036800_T/C chr22:20036810_CTTCT/C chr22:20036928_CTGCTGTG/C
PLXNB2 chr22:50275890_TC/T chr22:50275895_A/T chr22:50276678_ACCATCTG/A chr22:50278471_GC/G
DEPDC5 chr22:31754969_TG/T chr22:31783940_ATATT/A chr22:31815005_CGA/C chr22:31819054_C/T
SGSM1 chr22:24850321_T/TGCCG chr22:24855356_G/T chr22:24855370_GCA/G chr22:24855641_C/CA
MLC1 chr22:50064137_TTG/T chr22:50070543_C/CA chr22:50074290_TC/T chr22:50080005_CT/C
MLC1 chr22:50064137_TTG/T chr22:50070543_C/CA chr22:50074290_TC/T chr22:50080005_CT/C
But I keep getting an error message saying that there are no samples left after removing missing samples. This isn't true, so I'm sure it must be a problem with the formatting of the group file
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: XKR3
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: CBX7
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: PISD
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: L3MBTL2-AS1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: FAM118A
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: DEPDC5
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: RAB36
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: TOP3B
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: APOBEC3G
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NF2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: APOBEC3H
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: TRMT2A
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SYNGR1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: RGL4
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: TMPRSS6
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: TAB1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: RPS19BP1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: CDC45
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: PRAME
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: PRAME
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: PRAME
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: PRAME
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: BID
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: BID
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: TRIOBP
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: DERL3
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: GGTLC2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: PNPLA5
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: IL17REL
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: IL17REL
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: BID
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: YDJC
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NFAM1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: CDC45
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: CHCHD10
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: CHCHD10
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SLC2A11
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: C22orf39
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: C22orf39
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: RAB36
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: DDT
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SPECC1L
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SPECC1L
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: SMTN
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MB
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: RBFOX2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: RBFOX2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: RBFOX2
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: NCF4
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MPST
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: FAM227A
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: FAM227A
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: MCHR1
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: CCDC134
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: CCDC134
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: CCDC134
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: ARHGAP8
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: KIAA0930
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: KIAA0930
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: TTC38
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: ODF3B
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: ODF3B
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
geneID: ODF3B
std::size_t sample_size = marker_file.samples().size();126477
cntMarker: 0
[1] "No samples are left after removing samples with missing dosages/genotypes of variants in the gene"
closed the genofile!
Analysis ended at 1607092298 Seconds
Analysis took 157.3983 Seconds
Here is the start of the output (where everything seems fine). NB I've tried with and out without --IsDropMissingDosages and no sucess:
**
R version 3.6.1 (2019-07-05)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /data/home/hmy117/.conda/envs/RSAIGE/lib/R/lib/libRblas.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] SAIGE_0.39.3
loaded via a namespace (and not attached):
[1] compiler_3.6.1 Matrix_1.2-17 Rcpp_1.0.1 grid_3.6.1
[5] RcppParallel_4.4.4 lattice_0.20-38
$vcfFile
[1] "plof_rvtests_vcf_input_ms_chr22.vcf.gz"
$vcfFileIndex
[1] "plof_rvtests_vcf_input_ms_chr22.vcf.gz.tbi"
$vcfField
[1] "GT"
$bgenFile
[1] ""
$bgenFileIndex
[1] ""
$savFile
[1] ""
$savFileIndex
[1] ""
$idstoExcludeFile
[1] ""
$idstoIncludeFile
[1] ""
$rangestoExcludeFile
[1] ""
$rangestoIncludeFile
[1] ""
$chrom
[1] "22"
$start
[1] 1
$end
[1] 2.5e+08
$IsDropMissingDosages
[1] FALSE
$minMAF
[1] 0
$minMAC
[1] 0.5
$maxMAFforGroupTest
[1] 0.5
$minInfo
[1] 0
$sampleFile
[1] ""
$GMMATmodelFile
[1] "saige_gene.rda"
$varianceRatioFile
[1] "saige_gene_var_ratio.varianceRatio.txt"
$SAIGEOutputFile
[1] "SAIGE_gene_output_chrom22.vcf.genotype.txt"
$numLinesOutput
[1] 2
$IsSparse
[1] TRUE
$SPAcutoff
[1] 2
$IsOutputAFinCaseCtrl
[1] TRUE
$IsOutputNinCaseCtrl
[1] FALSE
$IsOutputHetHomCountsinCaseCtrl
[1] FALSE
$LOCO
[1] FALSE
$condition
[1] ""
$sparseSigmaFile
[1] "saige_gene_var_ratio.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx"
$groupFile
[1] "test_chr22"
$kernel
[1] "linear.weighted"
$method
[1] "optimal.adj"
$weights.beta.rare
[1] "1,25"
$weights.beta.common
[1] "1,25"
$weightMAFcutoff
[1] 0.01
$r.corr
[1] "0"
$IsSingleVarinGroupTest
[1] FALSE
$cateVarRatioMinMACVecExclude
[1] "0.5,1.5,2.5,3.5,4.5,5.5,10.5,20.5"
$cateVarRatioMaxMACVecInclude
[1] "1.5,2.5,3.5,4.5,5.5,10.5,20.5"
$dosageZerodCutoff
[1] 0.2
$IsOutputPvalueNAinGroupTestforBinary
[1] FALSE
$IsAccountforCasecontrolImbalanceinGroupTest
[1] TRUE
$weightsIncludeinGroupFile
[1] FALSE
$IsOutputBETASEinBurdenTest
[1] FALSE
$sampleFile_male
[1] ""
$X_PARregion
[1] ""
$is_rewrite_XnonPAR_forMales
[1] FALSE
$help
[1] FALSE
weights.beta.rare is 1 25
weights.beta.common is 1 25
cateVarRatioMinMACVecExclude is 0.5 1.5 2.5 3.5 4.5 5.5 10.5 20.5
cateVarRatioMaxMACVecInclude is 1.5 2.5 3.5 4.5 5.5 10.5 20.5
group-based test will be performed
Any dosages <= 0.2 for genetic variants with MAC <= 10 are set to be 0 in group tests
126477 samples have been used to fit the glmm null model
obj.glmm.null$LOCO: FALSE
Leave-one-chromosome-out option is not applied
variance Ratio is 1.000006 1.000007 1.000009 1.000014 1.000009 0.9999915 0.999993 0.9973885
isCondition is FALSE
Open VCF done
To read the field GT
Number of meta lines in the vcf file (lines starting with ##): 7
Number of samples in the vcf file: 126477
126477 sample IDs are found in the vcf file
[1] 126477 4
[1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y"
126477 samples were used in fitting the NULL glmm model and are found in sample file
sparse kinship matrix is going to be used
sparseSigmaFile: saige_gene_var_ratio.varianceRatio.txt_relatednessCutoff_0.125_2000_randomMarkersUsed.sparseSigma.mtx
Missing dosages will be mean imputed for the analysis
Analysis started at 1607092389 Seconds
minMAC: 0.5
minMAF: 0
Minimum MAF of markers to be tested is 1.976644e-06
It is a binary trait
Analyzing 623 cases and 125854 controls
isCondition is FALSE
Analysis started at 1607092389 Seconds
genetic variants with 1.976644e-06 <= MAF <= 0.5 are included for gene-based tests
It is a binary trait
Case-control imbalance is adjusted for binary traits.
I did consider whether the IDs need to reflect the PLINK IDs (VCFs made via PLINK), but that also didn't work.
Can you see where I'm going wrong? If possible would be amazing to be able to download a reference 'groupfile' containing all variants if that's not too unwieldly.
Thanks Ben
I suspect the issue is with the group file. I believe that it might work if you remove the "chr" and change chr22:29536722_G/A to 22:29536722_G/A Also make sure the tabs are set correctly.
Great catch @jameskozubek, thanks so much! You were right it was the 'chr' prefix. Got rid of it and it's working a treat.
Cheers Ben
Hi, I ran step2 for ukbb sample set but it failed with the error below: Can you please help me figure out what is wrong and why it failed? I allocated 64g memory to the run. Thanks - ruth
caught segfault address 0x194cc9d8, cause 'memory not mapped'
Traceback: 1: getDosage_bgen_noquery() 2: SPAGMMATtest(dosageFile = opt$dosageFile, dosageFileNrowSkip = opt$dosageFileNrowSkip, dosageFileNcolSkip = opt$dosageFileNcolSkip, dosageFilecolnamesSkip = dosageFilecolnames, vcfFile = opt$vcfFile, vcfFileIndex = opt$vcfFileIndex, vcfField = opt$vcfField, bgenFile = opt$bgenFile, bgenFileIndex = opt$bgenFileIndex, savFile = opt$savFile, savFileIndex = opt$savFileIndex, chrom = opt$chrom, start = opt$start, end = opt$end, sampleFile = opt$sampleFile, GMMATmodelFile = opt$GMMATmodelFile, varianceRatioFile = opt$varianceRatioFile, SAIGEOutputFile = opt$SAIGEOutputFile, minMAF = opt$minMAF, minMAC = opt$minMAC, numLinesOutput = opt$numLinesOutput, IsOutputAFinCaseCtrl = opt$IsOutputAFinCaseCtrl, LOCO = opt$LOCO) An irrecoverable exception occurred. R is aborting now ...
FYI - Here's a log of what was run: ---- COMMAND EXECUTED: --------------------------------------------------------- Rscript /data/chiarp/MG_GWAS/SAIGE/extdata/step2_SPAtests.R --bgenFile=/data/LNG/UKBIOBANK/EGAD00010001474/ukb_imp_chr1_v3.bgen --bgenFileIndex=/data/LNG/UKBIOBANK/EGAD00010001474/ukb_imp_chr1_v3.bgen.bgi --minMAF=0.0001 --minMAC=4 --chrom=1 --sampleFile=sampleIID_dosageBGEN.txt --GMMATmodelFile=./output1B/UKBB_mg_keepRelated.rda --varianceRatioFile=./output1B/UKBB_mg_keepRelated.varianceRatio.txt --SAIGEOutputFile=./output1B/UKBB_mg_keepRelated_chr1.SAIGE.bgen.txt --numLinesOutput=2 --IsOutputAFinCaseCtrl=TRUE
$dosageFile [1] ""
$dosageFileNrowSkip [1] 0
$dosageFileNcolSkip [1] 5
$dosageFilecolnamesSkip [1] ""
$vcfFile [1] ""
$vcfFileIndex [1] ""
$vcfField [1] "DS"
$bgenFile [1] "/data/LNG/UKBIOBANK/EGAD00010001474/ukb_imp_chr1_v3.bgen"
$bgenFileIndex [1] "/data/LNG/UKBIOBANK/EGAD00010001474/ukb_imp_chr1_v3.bgen.bgi"
$savFile [1] ""
$savFileIndex [1] ""
$chrom [1] "1"
$start [1] 1
$end [1] 2.5e+08
$minMAF [1] 1e-04
$minMAC [1] 4
$sampleFile [1] "sampleIID_dosageBGEN.txt"
$GMMATmodelFile [1] "./output1B/UKBB_mg_keepRelated.rda"
$varianceRatioFile [1] "./output1B/UKBB_mg_keepRelated.varianceRatio.txt"
$SAIGEOutputFile [1] "./output1B/UKBB_mg_keepRelated_chr1.SAIGE.bgen.txt"
$numLinesOutput [1] 2
$IsOutputAFinCaseCtrl [1] TRUE
$LOCO [1] FALSE
$help [1] FALSE
408961 samples have been used to fit the glmm null model obj.glmm.null$LOCO: FALSE Leave-one-chromosome-out option is not applied variance Ratio is 0.992 408962 sample IDs are found in sample file [1] 408962 4 [1] "IID" "IndexInModel" "IndexDose.x" "IndexDose.y" 408961 samples were used in fitting the NULL glmm model and are found in sample file minMAC: 4 minMAF: 1e-04 Minimum MAF of markers to be testd is 1e-04 Analysis started at 1.54e+09 Seconds no query list is provided 487409 samples are found in the bgen file 7402791 markers are found in the bgen file isVariant: TRUE It is a binary trait Analyzing 240 cases and 408721 controls