grailbio / bio

Bioinformatic infrastructure libraries
Apache License 2.0
74 stars 16 forks source link

Run error: panic: start must be less than end #2

Open TendoLiu opened 4 years ago

TendoLiu commented 4 years ago

bash-4.2$ bio-fusion \

-r1=$R1 \ -r2=$R2 \ -transcript=$transcript I1104 12:57:44.433680 11841 main.go:336] Start reading geneDB I1104 12:57:44.556241 11841 gene_db.go:310] Reading transcriptome /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa denovo panic: start must be less than end

goroutine 90 [running]: github.com/grailbio/bio/fusion.(GeneDB).ReadTranscriptome.func5(0xc000050f00, 0xc00049a5c0, 0xc0003d2a20, 0xc0004865a0, 0xc00024b530, 0xc0003ea050) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:356 +0x2bd created by github.com/grailbio/bio/fusion.(GeneDB).ReadTranscriptome /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:345 +0x3bb

eksyang commented 4 years ago

can you confirm where you have obtained $transcript file thanks

TendoLiu commented 4 years ago

Thank you so much. Please see my script and the log.

Script: `module load af4 hg19_UCSC=/bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/ucsc.hg19.fasta/ucsc.hg19.fasta anno=/bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/Other/annotation/gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz

bio-fusion -generate-transcriptome \ -output /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa \ -keep-mitochondrial-genes \ -keep-readthrough-transcripts \ -keep-pary-locus-transcripts \ -keep-versioned-genes \ $anno $hg19_UCSC

R1=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion/fastq_merged/Sample1/Sample1.R1.fastq R2=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion/fastq_merged/Sample1/Sample1.R2.fastq transcript=/bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa out=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion

bio-fusion \ -r1=$R1 \ -r2=$R2 \ -umi-in-name \ -fasta-output=$out/test.out.fa \ -filtered-output=$out/test.filtered.fa \ -transcript=$transcript \ -output $out ` Log:

`2019/11/04 15:48:41 Creating /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa I1104 15:48:41.464042 15119 parsegencode.go:246] GTF: /bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/Other/annotation/gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz I1104 15:48:50.646038 15119 parsegencode.go:248] Read 66738 genes, 247086 transcripts, 1478937 exons I1104 15:49:06.822649 15119 parsegencode.go:358] Successfully parsed 66738 genes, 247086 transcripts and 1478937 exons I1104 15:49:06.822692 15119 parsegencode.go:360] Retained 247086 transcripts and 1478937 exons E1104 15:49:33.213128 15119 parsegencode.go:490] invalid query range 63686370 - 63688050 for sequence chr20 with length 63025520 panic: invalid query range 63686370 - 63688050 for sequence chr20 with length 63025520

goroutine 1 [running]: github.com/grailbio/base/log.Panic(0xc0004c99d8, 0x1, 0x1) /ihome/crc/install/af4/src/github.com/grailbio/base/log/log.go:168 +0xc7 github.com/grailbio/bio/fusion/parsegencode.PrintParsedGTFRecords(0x1042300, 0xc00025ea40, 0x104a020, 0xc07dd83260, 0xc07c132000, 0x104b2, 0x104b2, 0x1010101000000) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/parsegencode/parsegencode.go:490 +0x576 github.com/grailbio/bio/fusion/cmd.GenerateTranscriptome(0x104bc20, 0xc00047d2c0, 0x7fff2027caa0, 0x8f, 0x7fff2027cb30, 0x6b, 0x0, 0x7fff2027c9d8, 0x5c, 0x0, ...) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/cmd/generate_transcriptome.go:114 +0x617 github.com/grailbio/bio/fusion/cmd.Run() /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/cmd/main.go:558 +0xaa8 main.main() /ihome/crc/install/af4/src/github.com/grailbio/bio/cmd/bio-fusion/main.go:9 +0x20

I1104 15:50:30.178381 15172 main.go:336] Start reading geneDB I1104 15:50:30.256422 15172 gene_db.go:310] Reading transcriptome /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa denovo panic: start must be less than end

goroutine 90 [running]: github.com/grailbio/bio/fusion.(GeneDB).ReadTranscriptome.func5(0xc000050f00, 0xc000496580, 0xc0003c8a20, 0xc000476120, 0xc0002e5350, 0xc0003e0030) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:356 +0x2bd created by github.com/grailbio/bio/fusion.(GeneDB).ReadTranscriptome /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:345 +0x3bb

`

TendoLiu commented 4 years ago

It seems that there was a "panic" when generating the transcriptome file, but the file was produced anyway. Note that I am using a newer version of annotation file. And the reference genome is hg19 instead of hg38 in the example code.

TendoLiu commented 4 years ago

gencode.v26.whole_genes.txt Just in case you want to look at the transcriptome file. I converted .fa to .txt to be able to upload.

eksyang commented 4 years ago

Thanks, will look into it

TendoLiu commented 4 years ago

Any follow-up?

eksyang commented 4 years ago

sorry, at the moment, I do not have linux machine to debug into it. But after relook into the issue, it appears that the sequence chr20 in the provided fasta file has length 63025520, however, there seems to be a gene in the gtf in the range of 63686370 - 63688050 in "chr20". The only issue I can think of is that the gtf file you used ($anno = gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz) and the reference fasta file ($hg19_UCSC = ucsc.hg19.fasta) are not compatible with each other. possible to confirm that ?

eksyang commented 4 years ago

hi TendoLiu, a colleague offerred to help after holidays. In order to debug into this, could you share or point to the link to obtain the two files gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz and ucsc.hg19.fasta (if you cannot yet confirm the annotation and the fasta file do not match). Sorry for the delay.