qtrinh / qpipeline

An annotation and util pipeline for Next Gen Sequencing data
1 stars 2 forks source link

seq fault #1

Closed batis2ta closed 5 years ago

batis2ta commented 6 years ago

Dear madam/sir,

I have recently tried to use ISOWN, and I was getting a seqmentation fault error when doing the first annotations. I found out that this happens when the software is using qpipeline.

I downloaded qpipeline from the website. while installing the tabix i get these error

gcc -g -Wall -O2 -fPIC -o tabix main.o -lm -lz -L. -ltabix ./libtabix.a(bgzf.o): In function deflate_block': /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:311: undefined reference todeflate' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:313: undefined reference to deflateEnd' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:305: undefined reference todeflateInit2_' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:329: undefined reference to deflateEnd' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:345: undefined reference tocrc32' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:346: undefined reference to crc32' ./libtabix.a(bgzf.o): In functioninflate_block': /home/mohammer/qpipeline/externaltools/tabix/tabix-0.2.5/bgzf.c:380: undefined reference to `inflateInit2' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:385: undefined reference to inflate' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:391: undefined reference toinflateEnd' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bgzf.c:387: undefined reference to inflateEnd' ./libtabix.a(bedidx.o): In functionks_getuntil': /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bedidx.c:11: undefined reference to gzread' ./libtabix.a(bedidx.o): In functionbed_read': /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bedidx.c:103: undefined reference to gzdopen' ./libtabix.a(bedidx.o): In functionks_getc': /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bedidx.c:11: undefined reference to gzread' ./libtabix.a(bedidx.o): In functionbed_read': /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bedidx.c:138: undefined reference to gzclose' /home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5/bedidx.c:103: undefined reference togzopen64' collect2: error: ld returned 1 exit status make[1]: [tabix] Error 1 make[1]: Leaving directory `/home/mohammer/qpipeline/external_tools/tabix/tabix-0.2.5' make: [all-recur] Error 1

I have a newer version of tabix installed systemwide (1.5). Using qpipeline with the broken tabix installation gives the same error. Could you please help me solved this issue?

Best regards

qtrinh commented 6 years ago

Hi, Please use qpipeline that came with ISOWN. When you get the seg fault, please copy the commands on the screen and send them to us. We will be able to tell you why or what the issue is.

Thanks

Q

batis2ta commented 6 years ago

Dear Q thanks for the answer. This is what happens after I use the ISOWN's qpipeline. The annovar annotations runs for a while but when starting dbSNP or any other annotation it fails instantly. And I should say it happens with any two VCF files.

Best regards

hundereds of these lines (Use of uninitialized value in concatenation (.) or string at /home/mohammer/ISOWN/bin/addAnnovarToVCF.pl line 75, line 275459.)

annotating input file with dbSNP ...

/home/mohammer/ISOWN/bin/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/bin/../external_databases/dbSNP142_All_20141124.vcf.gz.modified.vcf.gz -A -E -p dbSNP142_All_20141124 -i /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.annovar.vcf -f /home/mohammer/ISOWN/bin/../external_databases/hg19_random.fa > /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.dbSNP.vcfSegmentation fault

annotating input file with COSMIC ...

/home/mohammer/ISOWN/bin/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/bin/../external_databases/COSMIC_v69.vcf.gz -A -E -p COSMIC_69 -i /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.dbSNP.vcf -f /home/mohammer/ISOWN/bin/../external_databases/hg19_random.fa > /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.cosmic.vcfSegmentation fault

annotating input file with ExAC ...

/home/mohammer/ISOWN/bin/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/bin/../external_databases/ExAC.r0.3.sites.vep.vcf.20150421.vcf.gz -A -E -p ExAC.r0.3_20150421 -i /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.cosmic.vcf -f /home/mohammer/ISOWN/bin/../external_databases/hg19_random.fa > /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.exac.vcfSegmentation fault

annotating input file with MutationAccessor ...

/home/mohammer/ISOWN/bin/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/bin/../external_databases/MA.release2.vcf.gz -A -E -p 2013_12_11_MA -i /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.exac.vcf -f /home/mohammer/ISOWN/bin/../external_databases/hg19_random.fa > /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.ma.vcfSegmentation fault

annotating input file with PolyPhen ...open: No such file or directory

[my_tabix.c:46] fail to open tabix data file '/home/mohammer/ISOWN/bin/../external_databases/WHESS_20150403.txt.gz'.

annotating input file with sequence context ...[fai_load] build FASTA index.

calculating flanking region ...Segmentation fault

final reformatting ...Segmentation fault

cleanup: deleting temporary files ( /home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp. ) ...

qtrinh commented 6 years ago

Sorry for the delay! Can you check to see if Annovar ran to completion? Also, can you share the first 200 lines of your input file (/home/mohammer/Desktop/share/share1/009918/WES/AS-223506-LR-34770_R1.fastq_44683_annotated.vcf.gz.temp.annovar.vcf ) ?

Thanks

Q

batis2ta commented 6 years ago

Hi there, I dont have access to my files right now but I will do it.

But I am very skeptic that it has something to do with annovar. Because using the qpipeline to do any annotation (e.g. vcf to vcf) gives me seg fault. I will add what you asked in some days.

batis2ta commented 6 years ago

And there it comes with the first 200 lines best regards Mo

fileformat=VCFv4.0

fileDate=20180522

source=lofreq call -f /home/mohammer/Desktop/share/share1/human_genome37_gatk.fa -l /home/mohammer/Desktop/share/share1/sureselectV5_Covered.bed --no-default-filter -r chr1:1-249250621 -o /tmp/lofreq2_call_parallelo2RBp7/0.vcf.gz /home/mohammer/Desktop/share/share1/009918/WES/bamfiles/AS-220146-LR-34242_R1.fastq_49732.bam

reference=/home/mohammer/Desktop/share/share1/human_genome37_gatk.fa

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FILTER=

FILTER=

FILTER=

FILTER= 0.001000">

FILTER=

FILTER=

CHROM POS ID REF ALT QUAL FILTER INFO

chr1 69511 . A G 676 PASS DP=119;AF=1.000000;SB=0;DP4=0,0,64,55;ANNOVAR=exonic,OR4F5;ANNOVAR_EXONIC=nonsynonymous SNV,OR4F5:NM_001005484:exon1:c.A421G:p.T141A, chr1 762273 . G A 1351 PASS DP=215;AF=0.995349;SB=0;DP4=0,1,74,140;ANNOVAR=ncRNA_exonic,LINC00115 chr1 808922 . G A 1294 PASS DP=105;AF=0.990476;SB=0;DP4=1,0,58,46;ANNOVAR=ncRNA_intronic,FAM41C chr1 808928 . C T 310 PASS DP=109;AF=0.532110;SB=0;DP4=27,24,32,26;ANNOVAR=ncRNA_intronic,FAM41C chr1 876499 . A G 1792 PASS DP=53;AF=1.000000;SB=0;DP4=0,0,38,15;ANNOVAR=intronic,SAMD11 chr1 877715 . C G 266 PASS DP=10;AF=1.000000;SB=0;DP4=0,0,2,8;ANNOVAR=intronic,SAMD11 chr1 877831 . T C 840 PASS DP=23;AF=1.000000;SB=0;DP4=0,0,11,12;ANNOVAR=exonic,SAMD11;ANNOVAR_EXONIC=nonsynonymous SNV,SAMD11:NM_152486:exon10:c.T1027C:p.W343R, chr1 878314 . G C 3962 PASS DP=298;AF=0.442953;SB=0;DP4=85,80,66,66;ANNOVAR=exonic,SAMD11;ANNOVAR_EXONIC=synonymous SNV,SAMD11:NM_152486:exon11:c.G1440C:p.G480G, chr1 880238 . A G 6079 PASS DP=162;AF=1.000000;SB=0;DP4=0,0,84,78;ANNOVAR=intronic,NOC2L chr1 881627 . G A 1647 PASS DP=97;AF=0.567010;SB=2;DP4=27,15,31,24;ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=synonymous SNV,NOC2L:NM_015658:exon16:c.C1843T:p.L615L, chr1 883625 . A G 4809 PASS DP=131;AF=1.000000;SB=0;DP4=0,0,69,62;ANNOVAR=intronic,NOC2L chr1 887560 . A C 8193 PASS DP=221;AF=1.000000;SB=0;DP4=0,0,97,124;ANNOVAR=intronic,NOC2L chr1 887801 . A G 3374 PASS DP=102;AF=1.000000;SB=0;DP4=0,0,61,41;ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=synonymous SNV,NOC2L:NM_015658:exon10:c.T1182C:p.T394T, chr1 888639 . T C 10614 PASS DP=331;AF=1.000000;SB=0;DP4=0,0,158,173;ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=synonymous SNV,NOC2L:NM_015658:exon9:c.A918G:p.E306E, chr1 888659 . T C 10481 PASS DP=315;AF=1.000000;SB=0;DP4=0,0,148,167;ANNOVAR=exonic,NOC2L;ANNOVAR_EXONIC=nonsynonymous SNV,NOC2L:NM_015658:exon9:c.A898G:p.I300V, chr1 894573 . G A 9556 PASS DP=259;AF=1.000000;SB=0;DP4=0,0,93,166;ANNOVAR=intronic,NOC2L chr1 897325 . G C 14405 PASS DP=389;AF=0.997429;SB=0;DP4=0,1,168,220;ANNOVAR=exonic,KLHL17;ANNOVAR_EXONIC=synonymous SNV,KLHL17:NM_198317:exon4:c.G609C:p.A203A, chr1 897564 . T C 13436 PASS DP=371;AF=1.000000;SB=0;DP4=0,0,214,157;ANNOVAR=intronic,KLHL17 chr1 898323 . T C 9690 PASS DP=270;AF=0.988889;SB=3;DP4=1,0,110,157;ANNOVAR=intronic,KLHL17 chr1 898467 . C T 112 PASS DP=384;AF=0.018229;SB=0;DP4=195,182,4,3;ANNOVAR=intronic,KLHL17 chr1 909309 . T C 3112 PASS DP=228;AF=0.464912;SB=3;DP4=71,51,67,39;ANNOVAR=exonic,PLEKHN1;ANNOVAR_EXONIC=nonsynonymous SNV,PLEKHN1:NM_001160184:exon13:c.T1426C:p.S476P,PLEKHN1:NM_032129:exon14:c.T1531C:p.S511P, chr1 909768 . A G 7088 PASS DP=191;AF=1.000000;SB=0;DP4=0,0,86,105;ANNOVAR=intronic,PLEKHN1 chr1 911916 . C T 3238 PASS DP=239;AF=0.451883;SB=5;DP4=65,66,46,62;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=synonymous SNV,PERM1:NM_001291366:exon3:c.G2238A:p.R746R,PERM1:NM_001291367:exon4:c.G1956A:p.R652R, chr1 914192 . G C 632 PASS DP=17;AF=1.000000;SB=0;DP4=0,0,13,4;ANNOVAR=intronic,PERM1 chr1 914333 . C G 1153 PASS DP=80;AF=0.487500;SB=0;DP4=25,16,25,14;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=nonsynonymous SNV,PERM1:NM_001291366:exon2:c.G2077C:p.E693Q,PERM1:NM_001291367:exon3:c.G1795C:p.E599Q, chr1 914839 . C T 2230 PASS DP=154;AF=0.487013;SB=0;DP4=49,30,47,28;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=nonsynonymous SNV,PERM1:NM_001291366:exon2:c.G1571A:p.R524Q,PERM1:NM_001291367:exon3:c.G1289A:p.R430Q, chr1 914852 . G C 2709 PASS DP=177;AF=0.508475;SB=0;DP4=54,33,57,33;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=nonsynonymous SNV,PERM1:NM_001291366:exon2:c.C1558G:p.Q520E,PERM1:NM_001291367:exon3:c.C1276G:p.Q426E, chr1 914876 . T C 8394 PASS DP=238;AF=1.000000;SB=0;DP4=0,0,135,103;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=nonsynonymous SNV,PERM1:NM_001291366:exon2:c.A1534G:p.S512G,PERM1:NM_001291367:exon3:c.A1252G:p.S418G, chr1 914940 . T C 5454 PASS DP=325;AF=0.538462;SB=0;DP4=81,68,94,81;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=synonymous SNV,PERM1:NM_001291366:exon2:c.A1470G:p.A490A,PERM1:NM_001291367:exon3:c.A1188G:p.A396A, chr1 915227 . A G 5439 PASS DP=154;AF=1.000000;SB=0;DP4=0,0,77,77;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=synonymous SNV,PERM1:NM_001291366:exon2:c.T1183C:p.L395L,PERM1:NM_001291367:exon3:c.T901C:p.L301L, chr1 916549 . A G 11206 PASS DP=299;AF=1.000000;SB=0;DP4=0,0,159,140;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=nonsynonymous SNV,PERM1:NM_001291367:exon2:c.T58C:p.W20R, chr1 917492 . C T 948 PASS DP=62;AF=0.500000;SB=0;DP4=16,15,18,13;ANNOVAR=exonic,PERM1;ANNOVAR_EXONIC=synonymous SNV,PERM1:NM_001291367:exon1:c.G6A:p.P2P,

qtrinh commented 6 years ago

Hi It looks like there is no FORMAT column in your file! Can you regenerate it with the FORMAT column?

Thanks

Q

batis2ta commented 6 years ago

Thanks a lot for the answer. I have used a file with FORMAT column. the results is as will follow. There is no temp files left. I will write the VCF head below ..

annotating input file with ANNOVAR ...NOTICE: Output files were written to /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.annovar.vcf.temp.convert2annovar.variant_function, /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.annovar.vcf.temp.convert2annovar.exonic_variant_function NOTICE: Reading gene annotation from /home/mohammer/ISOWN/bin/../external_tools/annovar_2012-03-08/humandb/hg19_refGene.txt ... Done with 63481 transcripts (including 15216 without coding sequence annotation) for 27720 unique genes NOTICE: Processing next batch with 815 unique variants in 815 input lines

annotating input file with dbSNP ...

/home/mohammer/ISOWN/bin/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/bin/../external_databases/All_20180423.modified.vcf.gz -A -E -p dbSNP142_All_20141124 -i /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.annovar.vcf -f /home/mohammer/ISOWN/bin/../external_databases/hg19_random.fa > /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.dbSNP.vcfSegmentation fault

annotating input file with COSMIC ... cwd=/home/mohammer/ISOWN/bin

/home/mohammer/ISOWN/bin/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/bin/../external_databases/CosmicAllVariants.vcf.gz -A -E -p COSMIC_69 -i /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.dbSNP.vcf -f /home/mohammer/ISOWN/bin/../external_databases/hg19_random.fa > /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.cosmic.vcfSegmentation fault

annotating input file with ExAC ...

/home/mohammer/ISOWN/bin/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/bin/../external_databases/ExAC.r0.3.1.database.vcf.gz -A -E -p ExAC.r0.3_20150421 -i /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.cosmic.vcf -f /home/mohammer/ISOWN/bin/../external_databases/hg19_random.fa > /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.exac.vcfSegmentation fault

annotating input file with MutationAccessor ...

/home/mohammer/ISOWN/bin/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/bin/../external_databases/2013_12_11_MA.vcf.gz -A -E -p 2013_12_11_MA -i /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.exac.vcf -f /home/mohammer/ISOWN/bin/../external_databases/hg19_random.fa > /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp.ma.vcfSegmentation fault

annotating input file with PolyPhen ...

annotating input file with sequence context ...

calculating flanking region ...Segmentation fault

final reformatting ...Segmentation fault

cleanup: deleting temporary files ( /home/mohammer/Desktop/share/share/share/sortduprem/VCF_files/var_AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam.annot.gz.temp. ) ...

VCF head:

fileformat=VCFv4.1

samtoolsVersion=0.1.18 (r982:295)

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

FORMAT=

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AS-160531-LR-23417_R1.fastq_46894.sorted.mdup.bam

chr1 564460 . C CTTGCCG 23.5 . INDEL;DP=12;VDB=0.0006;AF1=0.5025;AC1=1;DP4=1,0,4,0;MQ=35;FQ=-14.7;PV4=1,1,0.2,0.00078 GT:PL:GQ 0/1:61,0,20:23 chr1 564542 . C T 222 . DP=77;VDB=0.0399;AF1=1;AC1=2;DP4=0,1,67,5;MQ=39;FQ=-206;PV4=0.082,0.078,0.33,0.22 GT:PL:GQ 1/1:255,179,0:99 chr1 564574 . A G 222 . DP=84;VDB=0.0392;AF1=1;AC1=2;DP4=0,1,65,15;MQ=40;FQ=-230;PV4=0.2,0.35,0.38,0.25 GT:PL:GQ 1/1:255,203,0:99

qtrinh commented 6 years ago

Hi Sorry for the delay again as I am away on vacation. You will need DP and AD in the FORMAT string -see https://github.com/ikalatskaya/ISOWN

Thanks

Q

batis2ta commented 6 years ago

Dear Q, thanks again for your answer, i was wondering if the two sample vcf files which come with ISOWN and are in test_data, should be annotated to each other? Both of them have AD and DP in their FORMAT column. But I get a seg fault even doing that.

best regards Mo

On Fri, Jun 8, 2018, 16:08 Quang Trinh notifications@github.com wrote:

Hi Sorry for the delay again as I am away on vacation. You will need DP and AD in the FORMAT string -see https://github.com/ikalatskaya/ISOWN

Thanks

Q

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/qtrinh/qpipeline/issues/1#issuecomment-395772187, or mute the thread https://github.com/notifications/unsubscribe-auth/AlRnEomOqb2Sxr6xqI61UbJ_ei1XXmLHks5t6oVfgaJpZM4T9gew .

qtrinh commented 6 years ago

hi Can you send the command you are running when getting set fault?

Thanks

batis2ta commented 6 years ago

Sure qpipeline tabix -m 2020 -d 3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf -i 3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf -p test > testcosmicqpipeline.vcf

Thanks a lot

qtrinh commented 6 years ago

I get it now ... you will need to bgzip 3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf first then index with tabix to create a .tbi and then run qpipeline with 3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf.gz.

Thanks

batis2ta commented 6 years ago

Hi Q thank a lot. I ran it with this command. one indexed agaist nonindexed:

qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/test_data/3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf.gz -i /home/mohammer/ISOWN/test_data/519b8381-95d5-4fce-a90c-7576cce2110c.dkfz-snvCalling_1-0-132-1.20160126.vcf -q test 
##fileformat=VCFv4.1
##fileDate=20160128
##pancancerversion=1.0
##reference=<ID=hs37d5,Source=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz>;
##center="DKFZ"
##workflowName=DKFZ_SNV_workflow
##workflowVersion=1.0.0
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=GERMLINE,Number=0,Type=Flag,Description="Indicates if record is a germline mutation">
##INFO=<ID=UNCLEAR,Number=0,Type=Flag,Description="Indicates if the somatic status of a mutation is unclear">
##INFO=<ID=VT,Number=1,Type=String,Description="Variant type, can be SNP, INS or DEL">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency in primary data, for each ALT allele, in the same order as listed">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="RMS Mapping Quality">
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Indicates membership in 1000Genomes">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position in the sample">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases">
##FILTER=<ID=RE,Description="variant in UCSC_27Sept2013_RepeatMasker.bed.gz region and/or SimpleTandemRepeats_chr.bed.gz region, downloaded from UCSC genome browser and/or variant in segmental duplication region, annotated by annovar">
##FILTER=<ID=BL,Description="variant in DAC-Blacklist from ENCODE or in DUKE_EXCLUDED list, both downloaded from UCSC genome browser">
##FILTER=<ID=DP,Description="<= 5 reads total at position in tumor">
##FILTER=<ID=SB,Description="Strand bias of reads with mutant allele = zero reads on one strand">
##FILTER=<ID=TAC,Description="less than 6 reads in Tumor at position">
##FILTER=<ID=dbSNP,Description="variant in dbSNP135">
##FILTER=<ID=DB,Description="variant in 1000Genomes, ALL.wgs.phase1_integrated_calls.20101123.snps_chr.vcf.gz or dbSNP">
##FILTER=<ID=HSDEPTH,Description="variant in HiSeqDepthTop10Pct_chr.bed.gz region, downloaded from UCSC genome browser">
##FILTER=<ID=MAP,Description="variant overlaps a region from wgEncodeCrgMapabilityAlign100mer.bedGraph.gz:::--breakPointMode --aEndOffset=1 with a value below 0.5, punishment increases with a decreasing mapability">
##FILTER=<ID=SBAF,Description="Strand bias of reads with mutant allele = zero reads on one strand and variant allele frequency below 0.1">
##FILTER=<ID=FRQ,Description="variant allele frequency below 0.05">
##FILTER=<ID=TAR,Description="Only one alternative read in Tumor at position">
##FILTER=<ID=UNCLEAR,Description="Classification is unclear">
##FILTER=<ID=DPHIGH,Description="Too many reads mapped in control at this region">
##FILTER=<ID=DPLOWC,Description="Only 5 or less reads in control">
##FILTER=<ID=1PS,Description="Only two alternative reads, one on each strand">
##FILTER=<ID=ALTC,Description="Alternative reads in control">
##FILTER=<ID=ALTCFR,Description="Alternative reads in control and tumor allele frequency below 0.3">
##FILTER=<ID=FRC,Description="Variant allele frequency below 0.3 in germline call">
##FILTER=<ID=YALT,Description="Variant on Y chromosome with low allele frequency">
##FILTER=<ID=VAF,Description="Variant allele frequency in tumor < 5 times allele frequency in control">
##FILTER=<ID=BI,Description="Bias towards a PCR strand or sequencing strand">
##SAMPLE=<ID=CONTROL,SampleName=control_NA,Individual=NA,Description="Control">
##SAMPLE=<ID=TUMOR,SampleName=tumor_NA,Individual=NA,Description="Tumor">
##TARGET_FILE:SureSelectHumanAllExonV4=file:///oicr/data/genomes/homo_sapiens_mc/Agilent/SureSelectHumanAllExonV4/S03723314_Regions.merged.sorted.bed.gz
##VCF_DATABASE_FILE:test=/home/mohammer/ISOWN/test_data/3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf.gz
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  TUMOR
zsh: segmentation fault  /home/mohammer/Desktop/share/share1/tabixtest/qpipeline/qpipeline tabix -m  -

> I got the above mention seg fault
> 
> then I ran two indexed files.
/home/mohammer/Desktop/share/share1/tabixtest/qpipeline/qpipeline tabix -m 2020 -d /home/mohammer/ISOWN/test_data/3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf.gz -i /home/mohammer/ISOWN/test_data/519b8381-95d5-4fce-a90c-7576cce2110c.dkfz-snvCalling_1-0-132-1.20160126.vcf.gz -q test 
zsh: segmentation fault  /home/mohammer/Desktop/share/share1/tabixtest/qpipeline/qpipeline tabix -m  -

still the seg fault.

Very curious case

qtrinh commented 6 years ago

The first command is correct. Can you set ulimit -s 65535 then run the first command with -v and send me the output?

Thanks

batis2ta commented 6 years ago

Dear Q, thanks for your answer, I cannot send you the whole files because it contains sensitive data. this is the first 160 lines:

##fileformat=VCFv4.1
##fileDate=20160128
##pancancerversion=1.0
##reference=<ID=hs37d5,Source=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz>;
##center="DKFZ"
##workflowName=DKFZ_SNV_workflow
##workflowVersion=1.0.0
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Indicates if record is a somatic mutation">
##INFO=<ID=GERMLINE,Number=0,Type=Flag,Description="Indicates if record is a germline mutation">
##INFO=<ID=UNCLEAR,Number=0,Type=Flag,Description="Indicates if the somatic status of a mutation is unclear">
##INFO=<ID=VT,Number=1,Type=String,Description="Variant type, can be SNP, INS or DEL">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency in primary data, for each ALT allele, in the same order as listed">
##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="RMS Mapping Quality">
##INFO=<ID=1000G,Number=0,Type=Flag,Description="Indicates membership in 1000Genomes">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth at this position in the sample">
##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward, ref-reverse, alt-forward and alt-reverse bases">
##FILTER=<ID=RE,Description="variant in UCSC_27Sept2013_RepeatMasker.bed.gz region and/or SimpleTandemRepeats_chr.bed.gz region, downloaded from UCSC genome browser and/or variant in segmental duplication region, annotated by annovar">
##FILTER=<ID=BL,Description="variant in DAC-Blacklist from ENCODE or in DUKE_EXCLUDED list, both downloaded from UCSC genome browser">
##FILTER=<ID=DP,Description="<= 5 reads total at position in tumor">
##FILTER=<ID=SB,Description="Strand bias of reads with mutant allele = zero reads on one strand">
##FILTER=<ID=TAC,Description="less than 6 reads in Tumor at position">
##FILTER=<ID=dbSNP,Description="variant in dbSNP135">
##FILTER=<ID=DB,Description="variant in 1000Genomes, ALL.wgs.phase1_integrated_calls.20101123.snps_chr.vcf.gz or dbSNP">
##FILTER=<ID=HSDEPTH,Description="variant in HiSeqDepthTop10Pct_chr.bed.gz region, downloaded from UCSC genome browser">
##FILTER=<ID=MAP,Description="variant overlaps a region from wgEncodeCrgMapabilityAlign100mer.bedGraph.gz:::--breakPointMode --aEndOffset=1 with a value below 0.5, punishment increases with a decreasing mapability">
##FILTER=<ID=SBAF,Description="Strand bias of reads with mutant allele = zero reads on one strand and variant allele frequency below 0.1">
##FILTER=<ID=FRQ,Description="variant allele frequency below 0.05">
##FILTER=<ID=TAR,Description="Only one alternative read in Tumor at position">
##FILTER=<ID=UNCLEAR,Description="Classification is unclear">
##FILTER=<ID=DPHIGH,Description="Too many reads mapped in control at this region">
##FILTER=<ID=DPLOWC,Description="Only 5 or less reads in control">
##FILTER=<ID=1PS,Description="Only two alternative reads, one on each strand">
##FILTER=<ID=ALTC,Description="Alternative reads in control">
##FILTER=<ID=ALTCFR,Description="Alternative reads in control and tumor allele frequency below 0.3">
##FILTER=<ID=FRC,Description="Variant allele frequency below 0.3 in germline call">
##FILTER=<ID=YALT,Description="Variant on Y chromosome with low allele frequency">
##FILTER=<ID=VAF,Description="Variant allele frequency in tumor < 5 times allele frequency in control">
##FILTER=<ID=BI,Description="Bias towards a PCR strand or sequencing strand">
##SAMPLE=<ID=CONTROL,SampleName=control_NA,Individual=NA,Description="Control">
##SAMPLE=<ID=TUMOR,SampleName=tumor_NA,Individual=NA,Description="Tumor">
##TARGET_FILE:SureSelectHumanAllExonV4=file:///oicr/data/genomes/homo_sapiens_mc/Agilent/SureSelectHumanAllExonV4/S03723314_Regions.merged.sorted.bed.gz
##VCF_DATABASE_FILE:test=/home/mohammer/ISOWN/test_data/3d2edf87-6ec5-4c9f-9212-e8a751cc33e8.dkfz-snvCalling_1-0-132-1.20160126.vcf.gz
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  TUMOR

###########################################

[my_tabix.c:459]
'chr1   876499  rs4372192_876499    A   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]]    AD:GT:DP:DP4    0,73:1/1:73:0,0,43,30'
'chr1:876499-876499'

======================
[my_tabix.c:480] - FOUND entry 1 in tabix databse:  'chr1   876499  rs4372192_876499    A   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]]    AD:GT:DP:DP4    0,48:1/1:48:0,0,32,16'

[input_data.c:103] - number of columns 10
column   1  'chr1'
column   2  '876499'
column   3  'rs4372192_876499'
column   4  'A'
column   5  'G'
column   6  '.'
column   7  'PASS'
column   8  'GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]]'
column   9  'AD:GT:DP:DP4'
column  10  '0,48:1/1:48:0,0,32,16'

[vcf.c:268] - comparing VCF entries 

[my_string.c:97] - 
chr1    876499  rs4372192_876499    A   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]]    AD:GT:DP:DP4    

[my_string.c:97] - 
chr1    876499  rs4372192_876499    A   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]]    AD:GT:DP:DP4    

[vcf.c:288] - variant class matched

a REF A

b REF A
[vcf.c:325] - REF matched

a ALT G

b ALT G
[vcf.c:372] - ALT matched

[vcf.c:398] - matched 

[my_tabix.c:523] - 1 matches
[my_tabix.c:526] - string to add to VCF INFO column ( length 189 ) :    test_VARIANT_MATCHED×chr1×876499×rs4372192_876499×A×G×.×PASS×GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]]×AD:GT:DP:DP4×0,48:1/1:48:0,0,32,16Ø

chr1    876499  rs4372192_876499    A   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]];test=1,1,test_VARIANT_MATCHED×chr1×876499×rs4372192_876499×A×G×.×PASS×GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,876429,876641]]×AD:GT:DP:DP4×0,48:1/1:48:0,0,32,16Ø AD:GT:DP:DP4    0,73:1/1:73:0,0,43,30

###########################################

[my_tabix.c:459]
'chr1   877715  rs6605066_877715    C   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]]    AD:GT:DP:DP4    0,40:1/1:40:0,0,13,27'
'chr1:877715-877715'

======================
[my_tabix.c:480] - FOUND entry 1 in tabix databse:  'chr1   877715  rs6605066_877715    C   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]]    AD:GT:DP:DP4    0,34:1/1:34:0,0,13,21'

[input_data.c:103] - number of columns 10
column   1  'chr1'
column   2  '877715'
column   3  'rs6605066_877715'
column   4  'C'
column   5  'G'
column   6  '.'
column   7  'PASS'
column   8  'GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]]'
column   9  'AD:GT:DP:DP4'
column  10  '0,34:1/1:34:0,0,13,21'

[vcf.c:268] - comparing VCF entries 

[my_string.c:97] - 
chr1    877715  rs6605066_877715    C   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]]    AD:GT:DP:DP4    

[my_string.c:97] - 
chr1    877715  rs6605066_877715    C   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]]    AD:GT:DP:DP4    

[vcf.c:288] - variant class matched

a REF C

b REF C
[vcf.c:325] - REF matched

a ALT G

b ALT G
[vcf.c:372] - ALT matched

[vcf.c:398] - matched 

[my_tabix.c:523] - 1 matches
[my_tabix.c:526] - string to add to VCF INFO column ( length 189 ) :    test_VARIANT_MATCHED×chr1×877715×rs6605066_877715×C×G×.×PASS×GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]]×AD:GT:DP:DP4×0,34:1/1:34:0,0,13,21Ø

chr1    877715  rs6605066_877715    C   G   .   PASS    GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]];test=1,1,test_VARIANT_MATCHED×chr1×877715×rs6605066_877715×C×G×.×PASS×GERMLINE;SNP;AF=1.00,1.00;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chr1,877537,878481]]×AD:GT:DP:DP4×0,34:1/1:34:0,0,13,21Ø AD:GT:DP:DP4    0,40:1/1:40:0,0,13,27

And this is the tail

[my_tabix.c:459]
'chrX   154774707   rs2305518_154774707 C   T   .   PASS    GERMLINE;SNP;AF=0.46,0.36;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chrX,154774689,154774877]]  AD:GT:DP:DP4    38,21:0/1:59:22,16,9,12'
'chrX:154774707-154774707'

[my_tabix.c:516] - didn't find any entries in tabix database ... no data added

chrX    154774707   rs2305518_154774707 C   T   .   PASS    GERMLINE;SNP;AF=0.46,0.36;MQ=60;DB;1000G;[SureSelectHumanAllExonV4=1,1,[chrX,154774689,154774877]];test=0,0,.   AD:GT:DP:DP4    38,21:0/1:59:22,16,9,12

Please let me know if you need more lines. In that case I would be having to try to somehow remove chromosomal positions or randomize chromosomes or something.

Best regards Mo

qtrinh commented 6 years ago

Since the data is sensitive, can you send me email at quang.trinh@gmail.com?

Thanks

Q