zhpn1024 / ribotish

Ribo-seq TIS Hunter, predicting translation initiation sites and ORFs using riboseq data
http://dx.doi.org/10.1038/s41467-017-01981-8
GNU General Public License v3.0
24 stars 7 forks source link

"wrong exon strand" and "stop codon error" #8

Closed sjannielefevre closed 4 years ago

sjannielefevre commented 4 years ago

Hi

I just started working on riboseq data, and came across ribo-tish which seems really useful for QC-ing. I have aligned my reads using STAR as per the guidelines and using the gtf annotation file.

Now I started running the command on one of my bam files, and I have a couple of questions.

  1. how long time should it take? The bam file is 300mb, and ribotish has been running for almost an hour, so I wonder if something is wrong.

  2. I get multiple "Wrong exon strand" messages, as well as "stop codon error". I have tried to look for a description of these errors, but I cannot find any. What does it mean, and is it a problem?

Cheers Sjannie

zhpn1024 commented 4 years ago

Are you runing ribotish quality or predict? You can use '-p' option to speed up. The '-v' option can show more information of the progress. The "Wrong exon strand" messages means the gtf annotation has different strand in a same gene. The "stop codon error" message means the stop codon annotation is not consistant with CDS annotation. These are warnings when reading gtf file, and do not affect much.

Peng

sjannielefevre commented 4 years ago

Dear Peng

Thank you very much for clarifying. I am working with a non-model organism, so that is perhaps why these errors occurs.

In any case: I am using ribotish quality, and I managed to get the job to finish, however not succesfully. I am running the following script on a HPC cluster. Below the script I have inserted the resulting errors, and after that head of the .gtf file. I hope you can help me figure out what is wrong.

Cheers Sjannie

!/bin/bash

SBATCH --account=nn9244k

SBATCH --time=06:00:00

SBATCH --cpus-per-task=5

SBATCH --mem-per-cpu=4G

SBATCH --job-name=ribotish

SBATCH --array=0-23

set -o errexit # Exit the script on any error set -o nounset # Treat any unset variables as an error

module --quiet purge # Reset the modules to the system default

module load Python/3.7.4-GCCcore-8.3.0

NAMES=($(cat /cluster/work/users/sjannies/rfp/rfp_sample_names.list))

echo running ribotish on sample ${NAMES[${SLURM_ARRAY_TASK_ID}]}

ribotish quality --geneformat gtf -p 5 -b /cluster/work/users/sjannies/rfp/star/bams/${NAMES[${SLURM_ARRAY_TASK_ID}]}_rfp_goldfish_Aligned.sortedByCoord.out.bam -g /cluster/home/sjannies/blast_databases/GCF_003368295.1_ASM336829v1_genomic.gtf

echo finished ribotish

The following errors came up:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/cluster/software/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/cluster/software/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/ribo.py", line 942, in _lendis_trans for r in bam.transReadsIter(bamfile, t, compatible=False, maxNH=maxNH, minMapQ=minMapQ, secondary=secondary, paired=pai red, flank=flank): File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/bam.py", line 496, in transReadsIter for read in rds: #yield read File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/bam.py", line 45, in fetch_reads rds = self.fetch(reference=chr, start=start, end=stop) #, multiple_iterators=multiple_iterators) File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch File "pysam/libchtslib.pyx", line 692, in pysam.libchtslib.HTSFile.parse_region ValueError: start out of range (-34) """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/cluster/home/sjannies/.local/bin/ribotish", line 56, in main() File "/cluster/home/sjannies/.local/bin/ribotish", line 34, in main commands[cmd].run(args) File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/run/quality.py", line 85, in run cdsBins = args.bins, numProc = args.numProc, verbose = args.verbose, geneformat = args.geneformat) File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/ribo.py", line 980, in lendis for result in len_iter: File "/cluster/software/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/multiprocessing/pool.py", line 354, in return (item for chunk in result for item in chunk) File "/cluster/software/Python/3.7.4-GCCcore-8.3.0/lib/python3.7/multiprocessing/pool.py", line 748, in next raise value ValueError: start out of range (-34)

Head of .gtf file:

gtf-version 2.2

!genome-build ASM336829v1

!genome-build-accession NCBI_Assembly:GCF_003368295.1

!annotation-source NCBI Carassius auratus Annotation Release 100

NC_039243.1 Gnomon gene 5705 16574 . - . gene_id "LOC113109012"; db_xref "GeneID:113109012"; gbkey "Gene"; gene "LOC113109012"; gene_biotype "protein_coding"; NC_039243.1 Gnomon exon 16363 16574 . - . gene_id "LOC113109012"; transcript_id "XM_026272525.1"; db_xref "GeneID:113109012"; gbkey "mRNA"; gene "LOC113109012"; model_evidence "Supporting evidence includes similarity to: 8 Proteins, and 100% coverage of the annotated genomic feature by RNAseq alignments, including 74 samples with support for all annotated introns"; product "OX-2 membrane glycoprotein-like, transcript variant X1"; exon_number "1";

zhpn1024 commented 4 years ago

The error is because the tool is trying to get the 5'UTR region (upstream) of a transcript, while the CDS start is too close to the chromosome start, so a error occur when trying to fetch a negative position. I'll fix it soon. Or you may find out the gene and remove it in quality step.

sjannielefevre commented 4 years ago

Ok! Thank you very much for explaining. I'll run it again with -v to get more info, and do as you suggest and/or explore other tools meanwhile. Thanks again for answering so quickly!

Cheers Sjannie

zhpn1024 commented 4 years ago

I have committed an update in zbio/bam.py. Replace the file and try again.

sjannielefevre commented 4 years ago

Hi Peng

Unfortunately, I still get the same error. I tried uninstalling and installing again after downloading the package. I do not get information about the offending transcript, even with -v option, so cannot remove it easily.

Cheers Sjannie

zhpn1024 commented 4 years ago

I do not mean install again. Just download the latest src/zbio/bam.py file, and replace the old file in your computer ("/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/bam.py" as in your error message).

sjannielefevre commented 4 years ago

For me, it was easier to just download the repository, this should have included the file, right? There is no option to just download the 'bam.py' file, from what I can see.... But that is why I first downloaded the repository again , but it still did not work. So I uninstalled and installed, and still get the same error...

zhpn1024 commented 4 years ago

Where do you download the repository? Try git clone: git clone https://github.com/zhpn1024/ribotish

sjannielefevre commented 4 years ago

I downloaded it from this site, from the same tab where the repository is cloned, I just press download zip instead. I am not sure how to use git on a remote computer cluster. In any case, after removing everything, and downloading it, the file must have been replaced.

Still the errors:

multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/cluster/software/Python/3.7.2-GCCcore-8.2.0/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/cluster/software/Python/3.7.2-GCCcore-8.2.0/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/ribo.py", line 942, in _lendis_trans for r in bam.transReadsIter(bamfile, t, compatible=False, maxNH=maxNH, minMapQ=minMapQ, secondary=secondary, paired=pai red, flank=flank): File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/bam.py", line 496, in transReadsIter for read in rds: #yield read File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/bam.py", line 45, in fetch_reads rds = self.fetch(reference=chr, start=start, end=stop) #, multiple_iterators=multiple_iterators) File "pysam/libcalignmentfile.pyx", line 1081, in pysam.libcalignmentfile.AlignmentFile.fetch File "pysam/libchtslib.pyx", line 692, in pysam.libchtslib.HTSFile.parse_region ValueError: start out of range (-34) """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/cluster/home/sjannies/.local/bin/ribotish", line 56, in main() File "/cluster/home/sjannies/.local/bin/ribotish", line 34, in main commands[cmd].run(args) File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/run/quality.py", line 85, in run cdsBins = args.bins, numProc = args.numProc, verbose = args.verbose, geneformat = args.geneformat) File "/cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/ribo.py", line 980, in lendis for result in len_iter: File "/cluster/software/Python/3.7.2-GCCcore-8.2.0/lib/python3.7/multiprocessing/pool.py", line 354, in return (item for chunk in result for item in chunk) File "/cluster/software/Python/3.7.2-GCCcore-8.2.0/lib/python3.7/multiprocessing/pool.py", line 748, in next raise value ValueError: start out of range (-34) NW_020523287.1
NW_020523288.1 /var/spool/slurmd/job353985/slurm_script: line 22: 33860 Segmentation fault (core dumped) ribotish quality -v --geneformat gtf -p 5 -b /cluster/work/users/sjannies/rfp/star/bams/${NAMES[${SLURM_ARRAY_TASK_ID}]}_rfp_goldfish_Aligned.sortedByCoord.out.bam -g /cluster/home/sjannies/blast_databases/GCF_003368295.1_ASM336829v1_genomic.gtf

I am now trying to run the analysis after having removed the two entries listed above completely from the gtf file, as I presume they are written there because they are causing the error.

sjannielefevre commented 4 years ago

Hmm, that did not work either, so the listing of those two entries probably had nothing to do with it. I give up.

zhpn1024 commented 4 years ago

You may have downloaded the correct file, but do not replace the installed file. In your latest error report, the bam.py file is still the original version.

sjannielefevre commented 4 years ago

... I removed EVERYTHING from my .local python package folder. I then downloaded everything from this site. I then ran the pip install to install the package. How can it still be the old version of the file that is being used? I also previously tried downloading the new bam.py (or the code, as there is no option to download a single file) and that did not work either... I'll let the cluster managers know what you are saying. I cannot see what else I can do to replace this file, so maybe something has been installed somewhere else that I am not getting at when deleting and uninstalling...

zhpn1024 commented 4 years ago

That's the problem. You acturally installed from pip. The pip version is not updated yet. The new bam.py file is just in your downloaded zip.

sjannielefevre commented 4 years ago

Of course. Thanks for clearing that up :)

zhpn1024 commented 4 years ago

So just replace the bam.py file: cp unzippedfolder/src/zbio/bam.py /cluster/home/sjannies/.local/lib/python3.7/site-packages/ribotish/zbio/bam.py

zhpn1024 commented 4 years ago

Update to v0.2.5 and try.