Closed dgelsin closed 1 year ago
It may be a problem of gff annotation support. I have some updates in github recently. It is not officially released yet. Get the latest code by git clone and try again.
I was also trying to analyze prokaryotic Ribo-Seq data:
ribotish quality -b bamfile.bam -g genomefile.gtf -p 20
Output was exactly the same:
Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.
It turns out it was failing because of the annotation file—
Converting the NCBI RefSeq gff3 file to gtf (gtf v3) using AGAT solved the issue.
I am using ribotish v0.2.5
I was also trying to analyze prokaryotic Ribo-Seq data:
ribotish quality -b bamfile.bam -g genomefile.gtf -p 20
Output was exactly the same:
Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.
It turns out it was failing because of the annotation file—
- Failed using NCBI RefSeq gff3
- Failed using NCBI RefSeq gff3 converted to gtf (gff2) using Bioconductor rtracklayer::convert function
Converting the NCBI RefSeq gff3 file to gtf (gtf v3) using AGAT solved the issue.
I am using ribotish v0.2.5
It may be because ribotish fail to extract coding information from your gff. Could you provide a piece of example for coding transcripts in your gff annotation file?
Yes. I am sending the files.
i. annot.gff is the original file gff3 obtained from NCBI. Refseq ftp. It fails. ii. annot-gff2.gtf is file i. converted to gtf using rtracklayer. It fails. iii. annot.gtf is file i. converted to gtf using AGAT. It works.
I was also trying to analyze prokaryotic Ribo-Seq data:
ribotish quality -b bamfile.bam -g genomefile.gtf -p 20
Output was exactly the same:
Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.
It turns out it was failing because of the annotation file—
- Failed using NCBI RefSeq gff3
- Failed using NCBI RefSeq gff3 converted to gtf (gff2) using Bioconductor rtracklayer::export function
Converting the NCBI RefSeq gff3 file to gtf (gtf v3) using AGAT solved the issue.
I am using ribotish v0.2.5
I wasn't able ever to get ribotish to work with a ribosome profiling data set on an Archaea, but it sounds like this may have been the problem. Thanks for the solution!
In case you are interested, I ended up making a program to analyze bacterial and archaeal data called mRibo, which can be found here: https://github.com/dgelsin/mRibo
That is great! I will definitely try it. By the way, I have seen your preprint on Hvol Ribo-Seq and I am learning a lot from it. I am assuming you made this program to get those results. Hopefully it will be published soon. Cheers.
@alanlorenzetti Thanks! Yes this program is what I made to analyze the data in that paper. It actually was just accepted in NAR and will be out soon with a few more updates compared to the preprint.
@dgelsin I am glad to hear that! Congratulations!
Yes. I am sending the files.
i. annot.gff is the original file gff3 obtained from NCBI. Refseq ftp. It fails. ii. annot-gff2.gtf is file i. converted to gtf using rtracklayer. It fails. iii. annot.gtf is file i. converted to gtf using AGAT. It works.
The problem is that your gff3 file is gene-CDS instead of gene-transcript-CDS structure. My code did not find any transcripts in a gene, so nothing is processed. I'll try to solve it.
Yes. I am sending the files.
i. annot.gff is the original file gff3 obtained from NCBI. Refseq ftp. It fails. ii. annot-gff2.gtf is file i. converted to gtf using rtracklayer. It fails. iii. annot.gtf is file i. converted to gtf using AGAT. It works.
The gff3 problem in file i is resolved by the latest commit. The file ii is just ignored.
@zhpn1024 Great! @dgelsin Should we call it a closed issue?
@zhpn1024 Great! @dgelsin Should we call it a closed issue?
More tests are needed. I'll make a new release after the 3' end profile is added.
Yep, works for me. Thanks!
Hello,
I carried out ribosome profiling TI-seq on a prokaryotic organism and I am interested in seeing if I could use ribotish. I have first tried executing this command:
ribotish quality -l 10,40 -b /Volumes/Diego_2TB/ribosome_profiling_final_libs/libraries/H98_WT_ctrl/reads/density/filtering_bowtie/alignments/chr/H98_WT_ctrl.bam -g /Users/DRG/Downloads/GCF_000025685.1_ASM2568v1_genomic_update_080119.gff --geneformat gff
but then I get an error which is:
Counted reads: 0 Error: no reads found! Check read length or protein coding annotation.
I am having a hard time troubleshooting what could be the problem. From my own analysis, the peak footprint length is 27 nt, but there is a range of footprint lengths since this is MNase-treated (for a metagene plot of density periodicity and heatmap of length distribution see here. Is the problem the footprint length or is the gff annotation the problem? Or something else entirely?
Some extra details:
Any help/insight on what the problem could be is greatly appreciated.
Thank you.