zhpn1024 / ribotish

Ribo-seq TIS Hunter, predicting translation initiation sites and ORFs using riboseq data
http://dx.doi.org/10.1038/s41467-017-01981-8
GNU General Public License v3.0
24 stars 7 forks source link

Not working with stringtie custom GTF #37

Closed ericmalekos closed 2 months ago

ericmalekos commented 3 months ago

Hi, I created a custom GTF by assembling with stringtie but am getting the following error. The command works fine with gencodev46 GTF. Is there some reason why this wouldn't work?

ribotish predict \
-b ${BAM} \
-g ${GTF} \
-f ${GENOME} \
-o ${RIBOTISHDIR}ORF.tsv \
--ribopara ${RIBOTISHDIR}para.py \
--seq \
--aaseq \
--blocks \
--altcodons CTG,GTG,ACG \
--inframecount \
--maxNH 1000 \
--minMapQ 0 \
--secondary \
-p ${N} \
--verbose \
--transprofile ${RIBOTISHDIR}transprofile

"""
Traceback (most recent call last):
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/site-packages/ribotish/run/predict.py", line 394, in _pred_gene
    ttis = ribo.Ribo(t, bamload = tismbl, compatible = compatible, mis = compatiblemis)
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/site-packages/ribotish/zbio/ribo.py", line 62, in __init__
    self.cnts = bamload.transCounts(trans, compatible = compatible, mis = mis)
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/site-packages/ribotish/zbio/bam.py", line 613, in transCounts
    d = self.data[trans.strand]
KeyError: '.'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/gstore/home/malekose/micromamba/envs/test/bin/ribotish", line 56, in <module>
    main()
  File "/gstore/home/malekose/micromamba/envs/test/bin/ribotish", line 34, in main
    commands[cmd].run(args)
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/site-packages/ribotish/run/predict.py", line 236, in run
    for result in pred_iter:
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/multiprocessing/pool.py", line 451, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/gstore/home/malekose/micromamba/envs/test/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
KeyError: '.'
zhpn1024 commented 3 months ago

As the error info say, the strand is unknown '.', you can remove such transcripts or choose a specific trand.

ericmalekos commented 2 months ago

Thank you that fixed that issue. I am now experiencing some strange output. I have a appended contigs to the Hg38 genome and created a new gtf for them. The predict function runs without issue but there is no output, including in the transprofile file. Is there some reason for this? I can see that reads/signal are mapping to these contigs and ribocode does predict many of them as translating at high confidence.

head -6 fusion.gtf
ENSG00000250917.1_fusioncontig1 FASTA2GTF   gene    1   753 .   +   .   gene_id "ENSG00000250917.1_fusioncontig1"; gene_name "ENSG00000250917.1_fusioncontig1";
ENSG00000250917.1_fusioncontig1 FASTA2GTF   transcript  1   753 .   +   .   gene_id "ENSG00000250917.1_fusioncontig1"; transcript_id "ENSG00000250917.1_fusioncontig1_t1"; gene_name "ENSG00000250917.1_fusioncontig1"; transcript_name "ENSG00000250917.1_fusioncontig1_t1";
ENSG00000250917.1_fusioncontig1 FASTA2GTF   exon    1   753 .   +   .   gene_id "ENSG00000250917.1_fusioncontig1"; transcript_id "ENSG00000250917.1_fusioncontig1_t1"; gene_name "ENSG00000250917.1_fusioncontig1"; transcript_name "ENSG00000250917.1_fusioncontig1_t1"; exon_number "1"; exon_id "ENSG00000250917.1_fusioncontig1_t1.1";
AHCY_fusioncontig2  FASTA2GTF   gene    1   969 .   +   .   gene_id "AHCY_fusioncontig2"; gene_name "AHCY_fusioncontig2";
AHCY_fusioncontig2  FASTA2GTF   transcript  1   969 .   +   .   gene_id "AHCY_fusioncontig2"; transcript_id "AHCY_fusioncontig2_t1"; gene_name "AHCY_fusioncontig2"; transcript_name "AHCY_fusioncontig2_t1";
AHCY_fusioncontig2  FASTA2GTF   exon    1   969 .   +   .   gene_id "AHCY_fusioncontig2"; transcript_id "AHCY_fusioncontig2_t1"; gene_name "AHCY_fusioncontig2"; transcript_name "AHCY_fusioncontig2_t1"; exon_number "1"; exon_id "AHCY_fusioncontig2_t1.1";
KMT2C_fusioncontig3 FASTA2GTF   gene    1   4558    .   +   .   gene_id "KMT2C_fusioncontig3"; gene_name "KMT2C_fusioncontig3";
zhpn1024 commented 2 months ago

Does the bam file contain these contigs and reads? Setting the verbose option twice "-v -v" to see if the genes and transcripts are processed.

zhpn1024 commented 2 months ago

Another possibility is that the genome fasta index file does not contain these contigs.

ericmalekos commented 2 months ago

Ah yes I forgot to reindex the fasta! Thank you for the help!