Closed DarrenObbard closed 3 years ago
Tried it again with a slightly different (larger) test file. Again it seems to hang, but at a different point. I say 'hang' because although it hasn't thrown an error, there appears to be nothing happening (no tasks running, not memory being allocated)
File with .fasta extension detected, attempting to keep contigs over 1000 nt and find circular sequences with apc.pl
WebsterMelRebuild1021.fasta has DTRs/circularity
WebsterMelRebuild1022.fasta has DTRs/circularity
WebsterMelRebuild1066.fasta has DTRs/circularity
WebsterMelRebuild1502.fasta has DTRs/circularity
WebsterMelRebuild1591.fasta has DTRs/circularity
WebsterMelRebuild1757.fasta has DTRs/circularity
WebsterMelRebuild1758.fasta has DTRs/circularity
WebsterMelRebuild1964.fasta has DTRs/circularity
WebsterMelRebuild2062.fasta has DTRs/circularity
WebsterMelRebuild2440.fasta has DTRs/circularity
WebsterMelRebuild2522.fasta has DTRs/circularity
WebsterMelRebuild2523.fasta has DTRs/circularity
WebsterMelRebuild2524.fasta has DTRs/circularity
WebsterMelRebuild2525.fasta has DTRs/circularity
WebsterMelRebuild2526.fasta has DTRs/circularity
WebsterMelRebuild2742.fasta has DTRs/circularity
WebsterMelRebuild3594.fasta has DTRs/circularity
WebsterMelRebuild3595.fasta has DTRs/circularity
WebsterMelRebuild3596.fasta has DTRs/circularity
WebsterMelRebuild3671.fasta has DTRs/circularity
WebsterMelRebuild378.fasta has DTRs/circularity
WebsterMelRebuild4581.fasta has DTRs/circularity
WebsterMelRebuild4643.fasta has DTRs/circularity
WebsterMelRebuild4835.fasta has DTRs/circularity
WebsterMelRebuild4861.fasta has DTRs/circularity
WebsterMelRebuild649.fasta has DTRs/circularity
WebsterMelRebuild651.fasta has DTRs/circularity
WebsterMelRebuild885.fasta has DTRs/circularity
no reads provided or reads not found
Circular fasta file(s) detected
Putting non-circular contigs in a separate directory
time update: running IRF for ITRs in non-circular contigs 03-11-21---09:25:34
time update: running prodigal on linear contigs 03-11-21---09:25:42
time update: running linear contigs with hmmscan against virus hallmark gene database: standard 03-11-21---09:27:10
time update: Calling ORFs for circular/DTR sequences with prodigal 03-11-21---09:27:55
time update: running hmmscan on circular/DTR contigs 03-11-21---09:27:56
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
Grabbing ORFs wihout RPS-BLAST hits and separating them into individual files for HHsearch
time update: running HHsearch or HHblits 03-11-21---09:27:57
Combining tbl files from all search results AND fix overlapping ORF module
No ITR contigs with minimum hallmark genes found.
Annotating linear contigs
time update: running BLASTX, annotate linear contigs 03-11-21---09:27:57
time update: running Prodigal, annotate linear contigs 03-11-21---09:31:18
time update: running hmmscan1, annotating linear contigs 03-11-21---09:31:20
time update: running hmmscan2, annotating linear contigs 03-11-21---09:31:22
Been sitting at this point for 2 hours, with no tasks being executed (as far as I can guess, from htop)
Hi Darren,
Thanks for reaching out, and I'm sorry that it's hanging on you. I'm working to figure out what's happening. Just to be sure that it's not an issue with your input fasta files (weird headers?), can you run the test contigs that are provided with the repo (e.g. testcontigs_DNA_ct2.fasta)? In the meantime I'll try to replicate this error.
Mike
Hi! Thanks for getting back to me so fast!
I'm hoping that cenote-taker2 will revolutionize my workflow (or perhaps just replace a post-doc)
My input is Trinity output from a few years ago ... [my understanding is that fasta makes no stipulation except that names start with a ">" followed by any characters at all, then a newline before sequence, and sequence continues until the next '>' ]
The test file turns up a new error, suggesting a library problem. I'm using the supplied conda environment on a pretty clean new Linux install (scientific linux, a redhat derivative).
I recently rean into this in another context - https://github.com/merenlab/anvio/issues/1479
when I was trying to set up a conda environment for the newest Trinity and Samtools, and it took an age to resolve - possibly because of a version conflict?
time update: running IRF for ITRs in non-circular contigs 03-11-21---14:07:15
time update: running prodigal on linear contigs 03-11-21---14:07:15
time update: running linear contigs with hmmscan against virus hallmark gene database: standard 03-11-21---14:07:17
time update: Calling ORFs for circular/DTR sequences with prodigal 03-11-21---14:07:20
time update: running hmmscan on circular/DTR contigs 03-11-21---14:07:20
Annotating DTR contigs
Traceback (most recent call last):
File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/bin/circlator", line 57, in <module>
exec('import circlator.tasks.' + task)
File "<string>", line 1, in <module>
File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/circlator/__init__.py", line 26, in <module>
from circlator import *
File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/circlator/bamfilter.py", line 2, in <module>
import pysam
File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/pysam/__init__.py", line 5, in <module>
from pysam.libchtslib import *
ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/bin/circlator", line 57, in <module>
exec('import circlator.tasks.' + task)
File "<string>", line 1, in <module>
File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/circlator/__init__.py", line 26, in <module>
from circlator import *
File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/circlator/bamfilter.py", line 2, in <module>
import pysam
File "/data/home/dobbard/miniconda3/envs/cenote-taker2_env/lib/python3.6/site-packages/pysam/__init__.py", line 5, in <module>
from pysam.libchtslib import *
ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
Grabbing ORFs wihout RPS-BLAST hits and separating them into individual files for HHsearch
time update: running HHsearch or HHblits 03-11-21---14:07:24
Combining tbl files from all search results AND fix overlapping ORF module
No ITR contigs with minimum hallmark genes found.
Annotating linear contigs
time update: running BLASTX, annotate linear contigs 03-11-21---14:07:24
time update: running PHANOTATE, annotate linear contigs 03-11-21---14:07:52
time update: running Prodigal, annotate linear contigs 03-11-21---14:07:56
time update: running hmmscan1, annotating linear contigs 03-11-21---14:07:57
time update: running hmmscan2, annotating linear contigs 03-11-21---14:07:57
time update: running BLASTN, linear contigs 03-11-21---14:08:00
Internal3.blastn.out not found
Internal4.fna is closely related to a virus that has already been deposited in GenBank nt.
time update: running RPSBLAST, annotating linear contigs 03-11-21---14:11:47
/data/home/dobbard/scratch/test_cenote/Internal/no_end_contigs_with_viral_domain/COMBINED_RESULTS.rotate.AA.rpsblast.out
time update: running tRNAscan-SE 03-11-21---14:12:04
Grabbing ORFs wihout RPS-BLAST hits and separating them into individual files for HHsearch
time update: running HHsearch or HHblits 03-11-21---14:12:05
/data/home/dobbard/scratch/test_cenote/Internal/no_end_contigs_with_viral_domain/Internal.rotate.out_all.hhr
Combining tbl files from all search results AND fix overlapping ORF module, linear contigs
finalizing taxonomy for linear contigs
time update: finished annotating linear contigs 03-11-21---14:12:27
time update: running tbl2asn 03-11-21---14:12:28
[tbl2asn] This copy of tbl2asn is more than a year old. Please download the current version.
[tbl2asn] Flatfile Internal3
[tbl2asn] Validating Internal3
[tbl2asn] Flatfile Internal4
[tbl2asn] Validating Internal4
Making gtf tables from final feature tables
removing ancillary files
time update: Finishing 03-11-21---14:12:28
Virus prediction summary:
4 virus contigs were detected/predicted. 2 contigs had DTRs/circularity. 0 contigs had ITRs. 2 were linear/had no end features
grep: DTR_contigs_with_viral_domain/DTR_seqs_for_phanotate.txt: No such file or directory
grep: DTR_contigs_with_viral_domain/DTR_seqs_for_phanotate.txt: No such file or directory
output directory: Internal
>>>>>>CENOTE-TAKER 2 HAS FINISHED TAKING CENOTES<<<<<<
Hmmm. OK, based on the Anvio issue you referenced, maybe something is bugging out with circlator, and you can try to reinstall it like this?
conda install -c bioconda circlator=1.5.5 --force-reinstall
I sometimes regret having so many packages installed with Cenote-Taker 2 because if one of them breaks, the whole thing breaks. But I also didn't want to reinvent the wheel...
On the other hand, it seems like the error regarding "line 547: s/#/ /g" is no longer occurring with the provided test contigs, making me believe that Cenote-Taker 2 was mishandling the fasta header from your original runs. Could you do me a big favor and send some of the fasta headers from these files:
grep ">" LongWebster.fasta | head
grep ">" LongWebster.over_1000nt.fasta | head
Without fixing the libcrypto.so.1.0.0 problem, I have cleaned up my sequence titles (no funny characters at all!) and it hangs in the same place as before.
It seems to die during
time update: running hmmscan2, annotating linear contigs 03-11-21---14:30:43
And this seems to be the last sequence it was looking at when it stops:
>CleanWebster15 TR29739c0_g2_i1_len5765
CAGAGCTAGATTTTATTGCGGTACAATATTATTATCACGAATGTTTAAACAAGATTTACAATTTGAAGAAAATGGAATTAAACCCTATGTGTGTGTTAGAAATGGAGGAAATCGAGCATGTTGCCGATTTTACCCCTTTTCATATAGAGAATACGCCACATTTGAGTATAAATTCCCTTTTGACCCGGAGGTTGAAGAGGAAAATGAAGAAGTGGTATCAACGAACTATTTTGCCATGTTGGCAGAATTTGTCTTGGGTATCAGTTATTTTTGTGTGCTTTATCAGCTACTTACTATATCGTCGTGGAAGAAGATTT
ATAGTGGGTATGCCCAGTGCGCAAGCAAAGAGAGATGTTACAAACTCGTGGTCGCATCTGAGAAAGGAGTTTTAGAAGTAGAAGTGAAGGGATATCATAAGCGTACTATTAAGCAATTTGCTAATTATATGGCTGTGGTAATTTTGAAGGAATATTTGACTAAAGAACAGGTACAGCAAATGTTATTTTATTATTCTAATATATTTGCATATGATGATGATATTTGTGAAGTGCAGGCAGAAAATTCTCACCCGAAAGAATCGGTTCAGGGTGAGGAAGTTTTGACAGGTACAAAACATAGTAATACTATTTTAACT
AATAGTACAGGAGATACAGAGAGTATACCTCTAGCAATTAGAGATGATACTTTGAATTACGCCTCGAGCGAAGCCTTACATCAATTTGATAGTTTAACTGATAGATGGATGCCGTTAGAAACAATAACAGTTACTACATCACAGATTTCTGGTACACTATTAAAGGAATGGTATTTACCATATGATTTGTTGCAATCTCATATTATAAATCCGAGTTTAGCTCCATTTATGCTATTTCGCTACGGTGCTTTATCAATAGAGATGAAATTTGTAGTGAACGCTCACAAATTTCAATCCGGTAAAGCCTTAGCGAGCAT
TAAGTATGATCCAGTCGGTTTAACAGATTTTGGTGATTCATTACCTACATGTTTGCAACGAGAGCACGTGATGTTAGACTTATCTACTAATAATCAAGGAACATTGCAAATTCCTTTTATTTACCATCGTTCGTTCTTGCATTTAAATTTGCAGCAAGGTACAGATCAAACCATGGTACCATCCACATATGCTAGAGTACAGTTACACATCCTGGCCAATTTATTAACAGGAACTAATCAAGCAGTTAGCATGAACATCCGTCCTTATTATCGCTTCTCGAAAGCTTCATTTGCTGGAATGGAAGCAGTTCATACTG
TCCAGATGGATGTGGATGCAGTTGTAAAGGGATTAATACCAACAAAATCATTGAAAGCGGTGTTAGTTGGCGCAGAGGCTCTTATAGATCAATTAGGGAAGACTTGCAACCAGGACAAGCCTACAATTACTTCTTCCACTCAAATTGTTCCGAAACCCCGCAGTCAGTTTGCATCAGGAAAGGGGATTTTCGGAGGAACAGTTCTGAGATTAAATCCGCAGGTAATCACGTCTGCAGTTGAAGTGAAACAATCATCACGTACCCCTAGAACTGTACTGGATATAGCTAGAGTATGGGGATTGAAGAAAATTATGACG
TGGACTACGAATGCTAAACCAGATGAGCACCTTGATGATATTGTGGTTGATTTGCACCATAATTTTAAAGGGGGTAATGATCGTATTGAAGCAAATATATTGACTCCAGTTGAATATATAGCGTCTTTATATGGATTTTGGTCAGGGACATTAGAATGTAGGTTGGACTTTATATCCAATCAATTTCACACTGGTGCTATTATGATCAGTATACAAGTATCAAATCAAGAGACAAAATTTCAAAAGGCGGCTTGTGTATATACTAAAATTTTCCATTTGGGGGGTCAGAAAAGCGTCACATTCACCATTCCTTATAT
ATACGATACTATATGGCGTCGTAACACAGCTCAAATATTTACACCTTACACGTTTGAGCAAGATAATAAACTCCCTGTAGATCATATATTTACACTCGGTACGAATGATTTTATGAGAATCCAATTTTATGTTGTTAATGAATTACGAGCTCCAGATACAGTAGCGAATGTAGTTCAAATATTAGCTTATGTACGTGCGGGGACTAGTTTTATGTTACATTCTTTAAAACCGTCGCATTTGGAAGTTATACAGGACATAGCTCTTTTTAGAGACATACCTATGTTTAATGTACCTCATTTGGCACCTAAATCTTATA
TAACTAAGTCTGAGGAAAAACACATCAAGTTAACGAAAGAACTAACACTGGAGTATAAAGAAATCAAGTTTCAGATGGAAGGCTCCTTAGCTGAGAATCCAGATGAAACTCCTGATTTTAGTGCGGGTTTGAATGCTTTGCATATACAAACTTTAGATTCTCAAGTTAATATAAAGGATATTTTAAGGCGTCCTATACAGTTAACAAAAGCTATATCTTTTAGTAATACTGAAATAAAGAATCATGTATCTCTTTTTATCCCTTTAATGGTCCCATCTCATAATATGGTATATTCGGATAGTTATGAAACCATATAT
GCGGATGGAGTTTCCCTTACACCAACCGCTATGCTAATGAATTTATTTCGTTTTTGGCGAGGTAGTATGCGTTTTACCTTTGTTGTAAACGATAATGTATCCAAGAATTGTACACATTGGATAACTCACATGCCCCATTCGGGAGTTCGGAAAATTGGAAAGATTGAATTTCCAAAAGGTCCGAGTTTAGTTGGATCATCATTTGCTAGTGTCCCACTAGTCGCCAACATCAACGCGACGGAATGTGTCGAGGTACCCTATGATACGGAATTAAACTGGACGCTGTGTCATTCAGCTCGAAATAACCAAATCTTATC
AGTAAGAGATCAAACAGATACTAATGCAGGACATATAGTATTTACACCATCTGGTACATGTGATGTTACAGTGTGGTGGGAAGCTGGGGACGATTTTGAATATGAGAATTTCTTAGGAGTTCCGGCTACCATCACACGGGATCGTTTGCACGGTGTATACGAAACGGAAATTAAATTCCAAGCAGAAACATCAATGTATTCCAAAACCCTTGCGAAAGTGAATACTATAATAAATTTGCCAGAGCAGATAGCAGATACATTAACGAATGCTAATAATGTTGGTGACGCTATTATAGCGAGTTCTACGAAAGCAGAAA
AATTATTAGTCAAAGGGTTAGAAGTGTGCGAGAATGCATCAGCTATGTTAGATAATATTTCTCCTTTGATGGAATCTTTAGAGGAAAAAATTCGGGAATCCTTAAAATCATTTCCTGGAAGTATTTATAATTCTACAATGTTTATTCAAAATGGGGTTGAAATTATAATGGATTTAGTTGTCGCTTGGTTATCTGAATCGTGGGCCGTACTTGGTAATATTTTCGTCAAAGCTATAGCACGGTTGCTGGGATTTAGTGCCATACAAACTATTTTGAAGTACGGTTCCCAAATAGCCGCTGCTATTCGTAATCTGGTG
AACCCACAAATAGTAGTTCAGGCTCCATCGCAAAATGTCACATTATTGGGAGTATTATGTGGTTTAGTAGGTACAGTAGTGGGTGTATCTCTGGAAACCCAAAATTATTCTAAGTTTATTTATAAATTGTCTGAAAGATTTGTGACAACTGGGGGTATAGCTTATCTTAATCAAGTCTTACGGTTTGTGCAGAGTACCTTTGAAGTTATTCGTGACTTGGTGATGGATGCCCTTGGTTACGCTGATCCTAATGTAAAGGCTTTACAGATGCTCAGTAAAGATACAGGTGTAATTAGCACATTTGTAAAGGAGGCTAA
TGTCATATTAAGTGAAGCGAACGCCTCATTATTGTCAGATCCCGGTTTTCGTAAACGTTTTTGGTACACTGTGTCTCAGGCATACCAAATTCAATCAATTCTAGCCGTGAGTCCTGCGAATGTAGTTTCACCCATTGTGACTCGTTTATGTACCGATGTCATAAAAGCATCGAGTGAAAAGTTCATGGACTTATCGTGTAGTCCTTGTCGCTACGAACCATTTGTGATTTGTATAGAGGGTGAACCTGGTATAGGAAAATCTTTTATGACAGAGACCATGGTTTCCGAATTGCTTGGATCAATTGGTTTCGATCGTC
CATCCAGTGGCTTAATTTACACTCGGCCTCCTGGAGCACGATTCTGGTCAGGATATAAAAATCAGCCTGTAGTTGTTTATGATGATTGGATGAATTTGAACGATTCAGACCAAATACTGAGTCAGTTAAGTGAATTGTACCAGATGAAATCAACTAGTGATTTCATTCCAGAAATGGCTCACTTAGAAGAAAAGAAAATCAAAGCGAACCCTTTAATTGTCGTGCTATTGTGTAATGGTGCATTCCCCTCGTGTATAGGTCAAAAAGCGATTTATCCTGATGCTATTTTCAGACGTCGAGACTTAGTTTTGCGAGCC
TCTCTGAAGGAAGAATGGGTAGGAAAAGATTTACGCGACCTAACTGATAGTGAATCAGCTGAGTGTGGACATCTATTGTTTCAACGATATACTAGTGCGAAAATTGAGAATAGTTTAACCACAGCTCAAAAGACCTGGTCTGAAGTAAAACCTTGGTTGTGTGCCACATATAAACGCTACCACCAACAAGAAACACTTTTAGTACGTAAAAGAATTAAAAAGTTTCAAACTCAGATGCGTTTAAATAGTGAGAATTATCTAGACTATTCAGATCCTTTTTCTCTATTCTACACTAGCACCATTGATGTTATGGAAGA
CTCTGAGTGTAATCCTAATGGGTGGTTACCTAGTGAACAATTGGAGGCAGCTGTGTTGAGAGTTGTTGATATAATAAAGGAGAAGAAGGACGAAGTATTGGAATTTCATATAGATTCTAAACCTGAAAACGTCTTTCAGGGCTTTCCGGTGGGATGGGAAGATCTATCAATGAGCTTAACTAGTGGTATACTTTTTAGTGGAGGTGTTATGGCGCAAGTTTTAGACTGGACCGCTCAGGGTATAGGAGCTTTCATGAAACCACTATTAGAAAGTACGGGTCAGAGTATAGAACACGAGTGTATGACATGTCTTGAGC
AAATGCCCTGTTACTACGTATGTGGAGGTGTGCGTTCCCACTCTAACCCCAAAGCTCATCATTACATGTGCATGGATTGTATGATTCGCATGAAGCGAGCTAATATGGGTTCTCACTGTCCCATGTGTCGTGTAGAGCCTATGCTAGCTTGTTTACCTAAACATCTAACTCGCTTGTATATAGTGTTACGTTGGGCGTTGGTTAATGTTAGTGATAGATTAGTATGGATTTTTGCATTCTTTAGGGATTTTCTCCGTTCAAGGTCTATGGTAAATTCACGCTTATTATTATCTACCCTGGCATCATTAACTGCATTC
TTACAGGGCGATGGTATTACAACTACCATTGCTGCTTCATATGTAGGGGCAAGTGTGGTAGATGCTATATATGATCCAGAATTATTTACTAATGTAGCACAATCCTGGATATTTAACCCCTTGGATATGTTAGTTCCTTCAGAAGAATATTACACGCCTCCTTCGGAAATAATAAACGCTAGCGTGCAATGCATGCAGTTTGAAAGTCTTGGGCAGAGAGAGGTTGGTTGTAGCAACCTTGAGCCGGAGAAAGATTCATGGGATGTACTTACTCCTAAAGAAGAGGCTATACTTCGTTGTGAACGCAATAAGAACAA
AATGGATACTGCCTTAGTTATAAACAAAGCAGAACTCGAAAATATTCGAAAGAAGCGGG
after successfully writing a blank file called "CleanWebster15.all_called_hmmscans.txt"
I'm trying one on this sequence alone ....
The old-style Trinity headers had a nasty '|' , but also '=' and '[' and ']' and ' '
TR29739|c0_g2_i1 len=5765 path=[11551:0-1439 11555:1440-3480 11548:3481-5764] [-1, 11551, 11555, 11548, -2]
but I've cleaned this to
TR29739c0_g2_i1_len5765
Run on its own, the sequence above is OK, so maybe that wasn't the cause ...
looks promising:
Solving environment: done
## Package Plan ##
environment location: /data/home/dobbard/miniconda3/envs/cenote-taker2_env
added / updated specs:
- circlator=1.5.5
The following packages will be downloaded:
package | build
---------------------------|-----------------
certifi-2020.12.5 | py36h5fab9bb_1 143 KB conda-forge
------------------------------------------------------------
Total: 143 KB
The following packages will be UPDATED:
certifi pkgs/main::certifi-2020.12.5-py36h06a~ --> conda-forge::certifi-2020.12.5-py36h5fab9bb_1
The following packages will be SUPERSEDED by a higher-priority channel:
ca-certificates pkgs/main::ca-certificates-2021.1.19-~ --> conda-forge::ca-certificates-2020.12.5-ha878542_0
but no,
ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory
OK, I believe I've figured out at least one issue. Thank you for bearing with me here.
The circlator issue may actually be a pysam issue per: this issue
Can you check your pysam version (should be 0.15.3) and update if necessary
$ conda list | grep "pysam"
pysam 0.15.3 py36hda2845c_1 bioconda
conda install -c conda-forge -c bioconda pysam==0.15.3
The other issue may have to do with a problem on my end that I've possibly fixed. The trinity headers were not the issue. You've got RNA virus contig(s) where the whole contig is covered by an ORF that may not have a start and stop codon. I had incorrectly coded prodigal to use -c
for closed genomes for these step, requiring start/stop codons. The program is expecting at least 1 ORF, and it's not there due to this setting. I should have tested these types of contigs before releasing the update! If you do cd Cenote-Taker2
then git pull
. I think this should fix it. If you forgo the blastn step when you test this, you should get quicker results.
Let me know if this helps.
Hi! Fantastic, thank you.
The pysam was indeed the issue, and the test file now runs happily!
My own trial dataset (with the long ORF that lacks a start of stop, and the nasty headers) now runs to completion!
But there are still some things that worry me ...:
This still happens:
/data/home/dobbard/apps/CenoteTaker2/cenote-taker2.1.1.sh: line 547: s/#/ /g: No such file or directory
And when running blastn, what do lines like this imply?
MediumWebster1462.blastn.out not found
Is it just a virus / phage not in nt?
Then I get some hits that report like this:
cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Protostomia; Ecdysozoa; Panarthropoda; Arthropoda; Mandibulata; Pancrustacea; Hexapoda; Insecta; Dicondylia; Pterygota; Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Eremoneura; Cyclorrhapha; Schizophora; Acalyptratae; Ephydroi$
ea; Drosophilidae; Drosophilinae; Drosophilini; Drosophila; Sophophora; melanogaster group
; PREDICTED: Drosophila bipectinata twitchin (LOC108134366), transcript variant X2, mRNA
; PREDICTED: Drosophila bipectinata twitchin (LOC108134366), transcript variant X1, mRNA
; PREDICTED: Drosophila ananassae twitchin (LOC6501771), transcript variant X6, mRNA
What's the cause of this?
Then at the end I get a lot of this:
Virus prediction summary:
50 virus contigs were detected/predicted. 0 contigs had DTRs/circularity. 0 contigs had ITRs. 50 were linear/had no end features
grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt: No such file or directory
grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt: No such file or directory
grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt: No such file or directory
grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt: No such file or directory
What does this indicate?
Thanks!
Darren
Darren, I again thank you for raising these issues, and I apologize that my testing wasn't as thorough as I thought. please do git pull
again. Everything should be fixed and I have 2 questions for you.
I fixed the error with this s/#/ /g
As you thought, MediumWebster1462.blastn.out not found
implies that it doesn't have a strong BLASTN hit in your database. I changed the message to say sequence.blastn.out not found, no close BLASTN hits for this sequence.
Regard the blast reports, you have the phylogeny of the top hit on the first line, then the description of the top 3 hits. The description of the top hit is also in the note in the ".gbf" and ".fsa" files in the sequin_and_genome_maps directory. I don't really know exactly what users want to do with BLASTN info. What are your thoughts? Should it inform taxonomy in the output?
I also fixed the error with grep: no_end_contigs_with_viral_domain/LIN_seqs_for_phanotate.txt
My other question is, I know your lab has found some interesting segmented RNA viruses. You could of course use Cenote Taker 2 with -am True
on a multifasta of segments from the same virus, but it might be confusing to have a separate ".gbf" for each output. I haven't looked into generating combined outputs for segmented viruses. I could possible add this feature if you have some insight into the formatting, etc.
Hi!
Thank you for the pipeline! I have played around with several virus finders, and I have never previously found one that I thought worked well enough to use. I'm thinking we might start to use this routinely - so you're going to have to keep maintaining it!
Regard the blast reports, you have the phylogeny of the top hit on the first line, then the description of the top 3 hits. The description of the top hit is also in the note in the ".gbf" and ".fsa" files in the sequin_and_genome_maps directory. I don't really know exactly what users want to do with BLASTN info. What are your thoughts? Should it inform taxonomy in the output?
So, as you might imagine, I have some opinions to share! I think this blastn screen (I'm using nt at the moment) is really useful, but I think you should make more use of it for the taxonomy. It looks like your taxonomy might be based on refseq? For viruses refseq is always so out of date as to relatively little use for spotting 'known' viruses.
I think that, where the blastn is currently reported, it could be done more cleanly- purely as taxonomic information. So, leaving out the gene/segment etc etc and just report the top hit with "Sequence identity 98% to
Even better than the HSP identity would be a quick pairwise alignment between the new contig and its top blastn hit, and report the overall sequence identity for the shared length.
My other question is, I know your lab has found some interesting segmented RNA viruses. You could of course use Cenote Taker 2 with -am True on a multifasta of segments from the same virus, but it might be confusing to have a separate ".gbf" for each output. I haven't looked into generating combined outputs for segmented viruses. I could possible add this feature if you have some insight into the formatting, etc.
I think this would be great! I think genbank file could literally just be concatenated, as could gtf files to go with fsa files. I don't know if its too ugly, but folders could be created to hold the un-concatenated versions - then the concatenated file names could match the folders
I have a number of other questions / suggestions. Would you like them here, or by email?
Thanks for the feedback. Let's discuss further by email, and I'll make sure to include any changes that get made into the change log for the next update. michael.tisza@gmail.com
Hi,
I'm playing with Cenote-Taker2 for the first time, and (as far as I can tell) it keeps hanging: i.e. simply stopping execution with no feedback or continued output or execution. There are a couple of errors thrown, but no indication as to what might cause them or what the solution might be.
The command looks like this
python ~/apps/CenoteTaker2/run_cenote-taker2.py -c LongWebster.fasta --known_strains blast_knowns --blastn_db /data/BLAST_databases/nt -r WebsterMelRebuild -m 150 -t 40 -p False /data/home/dobbard/apps/CenoteTaker2
and things start well
######################################################################
###################################################################################
But the failed awk and the failed cat suggest something is going wrong. At this point it appears nothing is running, so I am suspicious that cat is attempting to read from stdin because there was no file?
also, the missing file requested in line 547
doesn't bode well.