Open TendoLiu opened 4 years ago
can you confirm where you have obtained $transcript file thanks
Thank you so much. Please see my script and the log.
Script: `module load af4 hg19_UCSC=/bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/ucsc.hg19.fasta/ucsc.hg19.fasta anno=/bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/Other/annotation/gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz
bio-fusion -generate-transcriptome \ -output /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa \ -keep-mitochondrial-genes \ -keep-readthrough-transcripts \ -keep-pary-locus-transcripts \ -keep-versioned-genes \ $anno $hg19_UCSC
R1=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion/fastq_merged/Sample1/Sample1.R1.fastq R2=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion/fastq_merged/Sample1/Sample1.R2.fastq transcript=/bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa out=/bgfs/soesterreich/pan_data/soesterreich/Tendo/Projects/Project2_cfDNA_fusion_capture/SecondPanel/7_SVcalling_noUMI/bio_fusion
bio-fusion \ -r1=$R1 \ -r2=$R2 \ -umi-in-name \ -fasta-output=$out/test.out.fa \ -filtered-output=$out/test.filtered.fa \ -transcript=$transcript \ -output $out ` Log:
`2019/11/04 15:48:41 Creating /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa I1104 15:48:41.464042 15119 parsegencode.go:246] GTF: /bgfs/soesterreich/pan_data/soesterreich/Tendo/ReferenceFiles/hg19_Versions/Other/annotation/gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz I1104 15:48:50.646038 15119 parsegencode.go:248] Read 66738 genes, 247086 transcripts, 1478937 exons I1104 15:49:06.822649 15119 parsegencode.go:358] Successfully parsed 66738 genes, 247086 transcripts and 1478937 exons I1104 15:49:06.822692 15119 parsegencode.go:360] Retained 247086 transcripts and 1478937 exons E1104 15:49:33.213128 15119 parsegencode.go:490] invalid query range 63686370 - 63688050 for sequence chr20 with length 63025520 panic: invalid query range 63686370 - 63688050 for sequence chr20 with length 63025520
goroutine 1 [running]: github.com/grailbio/base/log.Panic(0xc0004c99d8, 0x1, 0x1) /ihome/crc/install/af4/src/github.com/grailbio/base/log/log.go:168 +0xc7 github.com/grailbio/bio/fusion/parsegencode.PrintParsedGTFRecords(0x1042300, 0xc00025ea40, 0x104a020, 0xc07dd83260, 0xc07c132000, 0x104b2, 0x104b2, 0x1010101000000) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/parsegencode/parsegencode.go:490 +0x576 github.com/grailbio/bio/fusion/cmd.GenerateTranscriptome(0x104bc20, 0xc00047d2c0, 0x7fff2027caa0, 0x8f, 0x7fff2027cb30, 0x6b, 0x0, 0x7fff2027c9d8, 0x5c, 0x0, ...) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/cmd/generate_transcriptome.go:114 +0x617 github.com/grailbio/bio/fusion/cmd.Run() /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/cmd/main.go:558 +0xaa8 main.main() /ihome/crc/install/af4/src/github.com/grailbio/bio/cmd/bio-fusion/main.go:9 +0x20
I1104 15:50:30.178381 15172 main.go:336] Start reading geneDB I1104 15:50:30.256422 15172 gene_db.go:310] Reading transcriptome /bgfs/soesterreich/pan_data/soesterreich/Tendo/InstalledTools/af4/gencode.v26.whole_genes.fa denovo panic: start must be less than end
goroutine 90 [running]: github.com/grailbio/bio/fusion.(GeneDB).ReadTranscriptome.func5(0xc000050f00, 0xc000496580, 0xc0003c8a20, 0xc000476120, 0xc0002e5350, 0xc0003e0030) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:356 +0x2bd created by github.com/grailbio/bio/fusion.(GeneDB).ReadTranscriptome /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:345 +0x3bb
`
It seems that there was a "panic" when generating the transcriptome file, but the file was produced anyway. Note that I am using a newer version of annotation file. And the reference genome is hg19 instead of hg38 in the example code.
gencode.v26.whole_genes.txt Just in case you want to look at the transcriptome file. I converted .fa to .txt to be able to upload.
Thanks, will look into it
Any follow-up?
sorry, at the moment, I do not have linux machine to debug into it. But after relook into the issue, it appears that the sequence chr20 in the provided fasta file has length 63025520, however, there seems to be a gene in the gtf in the range of 63686370 - 63688050 in "chr20". The only issue I can think of is that the gtf file you used ($anno = gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz) and the reference fasta file ($hg19_UCSC = ucsc.hg19.fasta) are not compatible with each other. possible to confirm that ?
hi TendoLiu, a colleague offerred to help after holidays. In order to debug into this, could you share or point to the link to obtain the two files gencode.v31.chr_patch_hapl_scaff.annotation.gtf.gz and ucsc.hg19.fasta (if you cannot yet confirm the annotation and the fasta file do not match). Sorry for the delay.
bash-4.2$ bio-fusion \
goroutine 90 [running]: github.com/grailbio/bio/fusion.(GeneDB).ReadTranscriptome.func5(0xc000050f00, 0xc00049a5c0, 0xc0003d2a20, 0xc0004865a0, 0xc00024b530, 0xc0003ea050) /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:356 +0x2bd created by github.com/grailbio/bio/fusion.(GeneDB).ReadTranscriptome /ihome/crc/install/af4/src/github.com/grailbio/bio/fusion/gene_db.go:345 +0x3bb