zhpn1024 / ribotish

Ribo-seq TIS Hunter, predicting translation initiation sites and ORFs using riboseq data
http://dx.doi.org/10.1038/s41467-017-01981-8
GNU General Public License v3.0
24 stars 7 forks source link

Ribotish on prokaryotic ribosome profiling #4

Closed dgelsin closed 1 year ago

dgelsin commented 4 years ago

Hello,

I carried out ribosome profiling TI-seq on a prokaryotic organism and I am interested in seeing if I could use ribotish. I have first tried executing this command: ribotish quality -l 10,40 -b /Volumes/Diego_2TB/ribosome_profiling_final_libs/libraries/H98_WT_ctrl/reads/density/filtering_bowtie/alignments/chr/H98_WT_ctrl.bam -g /Users/DRG/Downloads/GCF_000025685.1_ASM2568v1_genomic_update_080119.gff --geneformat gff

but then I get an error which is: Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.

I am having a hard time troubleshooting what could be the problem. From my own analysis, the peak footprint length is 27 nt, but there is a range of footprint lengths since this is MNase-treated (for a metagene plot of density periodicity and heatmap of length distribution see here. Is the problem the footprint length or is the gff annotation the problem? Or something else entirely?

Some extra details:

  1. The annotation is from NCBI
  2. The bam file was aligned with bowtie, it is single-end, and it has been coordinate sorted.

Any help/insight on what the problem could be is greatly appreciated.

Thank you.

zhpn1024 commented 4 years ago

It may be a problem of gff annotation support. I have some updates in github recently. It is not officially released yet. Get the latest code by git clone and try again.

alanlorenzetti commented 4 years ago

I was also trying to analyze prokaryotic Ribo-Seq data:

ribotish quality -b bamfile.bam -g genomefile.gtf -p 20

Output was exactly the same: Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.

It turns out it was failing because of the annotation file—

Converting the NCBI RefSeq gff3 file to gtf (gtf v3) using AGAT solved the issue.

I am using ribotish v0.2.5

zhpn1024 commented 4 years ago

I was also trying to analyze prokaryotic Ribo-Seq data:

ribotish quality -b bamfile.bam -g genomefile.gtf -p 20

Output was exactly the same: Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.

It turns out it was failing because of the annotation file—

  • Failed using NCBI RefSeq gff3
  • Failed using NCBI RefSeq gff3 converted to gtf (gff2) using Bioconductor rtracklayer::convert function

Converting the NCBI RefSeq gff3 file to gtf (gtf v3) using AGAT solved the issue.

I am using ribotish v0.2.5

It may be because ribotish fail to extract coding information from your gff. Could you provide a piece of example for coding transcripts in your gff annotation file?

alanlorenzetti commented 4 years ago

Yes. I am sending the files.

i. annot.gff is the original file gff3 obtained from NCBI. Refseq ftp. It fails. ii. annot-gff2.gtf is file i. converted to gtf using rtracklayer. It fails. iii. annot.gtf is file i. converted to gtf using AGAT. It works.

annots.zip

dgelsin commented 4 years ago

I was also trying to analyze prokaryotic Ribo-Seq data:

ribotish quality -b bamfile.bam -g genomefile.gtf -p 20

Output was exactly the same: Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.

It turns out it was failing because of the annotation file—

  • Failed using NCBI RefSeq gff3
  • Failed using NCBI RefSeq gff3 converted to gtf (gff2) using Bioconductor rtracklayer::export function

Converting the NCBI RefSeq gff3 file to gtf (gtf v3) using AGAT solved the issue.

I am using ribotish v0.2.5

I wasn't able ever to get ribotish to work with a ribosome profiling data set on an Archaea, but it sounds like this may have been the problem. Thanks for the solution!

In case you are interested, I ended up making a program to analyze bacterial and archaeal data called mRibo, which can be found here: https://github.com/dgelsin/mRibo

alanlorenzetti commented 4 years ago

That is great! I will definitely try it. By the way, I have seen your preprint on Hvol Ribo-Seq and I am learning a lot from it. I am assuming you made this program to get those results. Hopefully it will be published soon. Cheers.

dgelsin commented 4 years ago

@alanlorenzetti Thanks! Yes this program is what I made to analyze the data in that paper. It actually was just accepted in NAR and will be out soon with a few more updates compared to the preprint.

alanlorenzetti commented 4 years ago

@dgelsin I am glad to hear that! Congratulations!

zhpn1024 commented 4 years ago

Yes. I am sending the files.

i. annot.gff is the original file gff3 obtained from NCBI. Refseq ftp. It fails. ii. annot-gff2.gtf is file i. converted to gtf using rtracklayer. It fails. iii. annot.gtf is file i. converted to gtf using AGAT. It works.

annots.zip

The problem is that your gff3 file is gene-CDS instead of gene-transcript-CDS structure. My code did not find any transcripts in a gene, so nothing is processed. I'll try to solve it.

zhpn1024 commented 4 years ago

Yes. I am sending the files.

i. annot.gff is the original file gff3 obtained from NCBI. Refseq ftp. It fails. ii. annot-gff2.gtf is file i. converted to gtf using rtracklayer. It fails. iii. annot.gtf is file i. converted to gtf using AGAT. It works.

annots.zip

The gff3 problem in file i is resolved by the latest commit. The file ii is just ignored.

alanlorenzetti commented 4 years ago

@zhpn1024 Great! @dgelsin Should we call it a closed issue?

zhpn1024 commented 4 years ago

@zhpn1024 Great! @dgelsin Should we call it a closed issue?

More tests are needed. I'll make a new release after the 3' end profile is added.

dgelsin commented 4 years ago

Yep, works for me. Thanks!