mtisza1 / Cenote-Taker2

Cenote-Taker2: Discover and Annotate Divergent Viral Contigs (Please use Cenote-Taker 3 instead)
MIT License
56 stars 7 forks source link

No taxonomic assignment rank infomation #17

Closed neptuneyt closed 1 year ago

neptuneyt commented 2 years ago

Dear CT2 team, Thanks for such amzing work. My issue is that there is no taxonomic assignment rank infomation (eg: "Viruses; Duplodnaviria; Heunggongvirae; Uroviricota; Caudoviricetes; Caudovirales; Podoviridae; unclassified Podoviridae; crAss-like viruses; environmental samples") compare normal output (see as below "blastx.out (correct)"), and the error output is "*blastx.out (something error)", when I run CT2 on "testcontig_DNA.fasta", and so threre is no correct taxnonmic name for contig 4 / 5 in " DNA_CONTIG_SUMMARY.tsv", which it should be "crAass-like phage".

I check the "err.log", but find nothing. Looking forward your reply. Thanks a lot.

==> no_end_contigs_with_viral_domain/DNA2_vs01.tax_guide.blastx.out <== DNA2_vs01 gi|289163351|ref|YP_003422530.1| replicase [Porcine circovirus type 1/2a] 99.359 0.0312

==> no_end_contigs_with_viral_domain/DNA3_vs01.tax_guide.blastx.out <== DNA3_vs01_8 gi|1464309244|ref|YP_009507580.1| major capsid protein [Heterosigma akashiwo virus 01] 46.154 9.20e-122 442

==> no_end_contigs_with_viral_domain/DNA4_vs01.tax_guide.blastx.out <== DNA4_vs01_12 gi|674660398|ref|YP_009052554.1| putative Terminase large subunit [uncultured crAssphage] 100.000 0.0 751

==> no_end_contigs_with_viral_domain/DNA5_vs01.tax_guide.blastx.out <== DNA5_vs01 gi|674660359|ref|YP_009052511.1| putative Protein of unknown function (DUF932) [uncultured crAssphage] 100.000 0.0 343


* blastx.out (correct)
```bash
(drep) [u@h@no_end_contigs_with_viral_domain]$ head *blastx.out
==> DNA_ct2_out1_vs01.tax_guide.blastx.out <==
DNA_ct2_out1_vs01_4     gi|906476413|ref|YP_009160408.1| replication protein VP4 [Microviridae Fen7918_21]      33.000  8.37e-09  100
Viruses; Monodnaviria; Sangervirae; Phixviricota; Malgrandaviricetes; Petitvirales; Microviridae; unclassified Microviridae

==> DNA_ct2_out2_vs01.tax_guide.blastx.out <==
DNA_ct2_out2_vs01       gi|289163351|ref|YP_003422530.1| replicase [Porcine circovirus type 1/2a]       99.359  0.0     312
Viruses; Monodnaviria; Shotokuvirae; Cressdnaviricota; Arfiviricetes; Cirlivirales; Circoviridae; Circovirus; unclassified Circovirus

==> DNA_ct2_out3_vs01.tax_guide.blastx.out <==
DNA_ct2_out3_vs01_8     gi|1464309244|ref|YP_009507580.1| major capsid protein [Heterosigma akashiwo virus 01]  46.154  9.20e-122 442
Viruses; Varidnaviria; Bamfordvirae; Nucleocytoviricota; Megaviricetes; Algavirales; Phycodnaviridae; Raphidovirus

==> DNA_ct2_out4_vs01.tax_guide.blastx.out <==
DNA_ct2_out4_vs01_12    gi|674660398|ref|YP_009052554.1| putative Terminase large subunit [uncultured crAssphage]       100.000   0.0     751
Viruses; Duplodnaviria; Heunggongvirae; Uroviricota; Caudoviricetes; Caudovirales; Podoviridae; unclassified Podoviridae; crAss-like viruses; environmental samples

==> DNA_ct2_out5_vs01.tax_guide.blastx.out <==
DNA_ct2_out5_vs01       gi|674660359|ref|YP_009052511.1| putative Protein of unknown function (DUF932) [uncultured crAssphage]    100.000 0.0     343
Viruses; Duplodnaviria; Heunggongvirae; Uroviricota; Caudoviricetes; Caudovirales; Podoviridae; unclassified Podoviridae; crAss-like viruses; environmental samples

Version 2.1.3

@@@@@@@@@@@@@@@@@@@@@@@@@ Your specified arguments: original contigs: ../testcontigs_DNA_ct2.fasta forward reads: /mnt/data/share/software/Cenote-Taker2/test/no_reads reverse reads: /mnt/data/share/software/Cenote-Taker2/test/no_reads title of this run: DNA Isolate source: unknown collection date: unknown metagenome_type: unknown SRA run number: unknown SRA experiment number: unknown SRA sample number: unknown Bioproject number: unknown template file: /mnt/data/share/software/Cenote-Taker2/dummy_template.sbt minimum circular contig length: 1000 minimum linear contig length: 1 virus domain database: standard min. viral hallmarks for linear: 0 min. viral hallmarks for circular: 0 handle known seqs: do_not_check_knowns contig assembler: unknown_assembler DNA or RNA: DNA HHsuite tool: hhblits original or TPA: original Do BLASTP?: no_blastp Do Prophage Pruning?: True Filter out plasmids?: True Run BLASTN against nt? none Location of Cenote scripts: /mnt/data/share/software/Cenote-Taker2 Location of scratch directory: none GB of memory: 500 number of CPUs available for run: 100 Annotation mode? True @@@@@@@@@@@@@@@@@@@@@@@@@ scratch space will not be used in this run HHsuite database locations: /mnt/data/share/software/Cenote-Taker2/NCBI_CD/NCBI_CD /mnt/data/share/software/Cenote-Taker2/pfam_32_db/pfam /mnt/data/share/software/Cenote-Taker2/pdb70/pdb70 /mnt/data/share/software/Cenote-Taker2/test/../testcontigs_DNA_ct2.fasta no CRISPR file given Prophage pruning requires --lin_minimum_hallmark_genes >= 1. changing to: --lin_minimum_hallmark_genes 1 time update: locating inputs: 11-16-21---10:25:35 /mnt/data/share/software/Cenote-Taker2/test/../testcontigs_DNA_ct2.fasta File with .fasta extension detected, attempting to keep contigs over 1 nt and find circular sequences with apc.pl No circular contigs detected. no reads provided or reads not found No circular fasta files detected. time update: running IRF for ITRs in non-circular contigs 11-16-21---10:25:35 time update: running prodigal on linear contigs 11-16-21---10:25:35 time update: running linear contigs with hmmscan against virus hallmark gene database: standard 11-16-21---10:25:37 Starting pruning of non-DTR/circular contigs with viral domains pruning script opened fna files found ./DNA1.fna is too short to prune chromosomal regions ./DNA2.fna is too short to prune chromosomal regions ./DNA3.fna is too short to prune chromosomal regions mv: cannot stat './DNA5.AA.sorted.fasta': No such file or directory ./DNA5.fna is too short to prune chromosomal regions time update: HMMSCAN of common viral domains beginning 11-16-21---10:25:38 time update: making tables for hmmscan and rpsblast outputs 11-16-21---10:25:39 time update: running RPSBLAST on each sequence 11-16-21---10:25:39 /mnt/data/share/software/Cenote-Taker2/test/DNA/no_end_contigs_with_viral_domain/COMBINED_RESULTS_PRUNE.AA.rpsblast.out time update: parsing tables into virus_signal.seq files for hmmscan and rpsblast outputs 11-16-21---10:25:40 time update: Identifying virus chunks, chromosomal junctions, and pruning contigs as necessary 11-16-21---10:25:41 Running file: DNA4.virus_signal.seq Window +/- to the right ... Chunk_end Window midpoint 0 1 + ... none 2500 0 1 + ... none 2500

[2 rows x 6 columns] time update: Making prophage table 11-16-21---10:25:43 cat: DNA5.AA.hmmscan.sort.out: No such file or directory cut: DNA5.AA.hmmscan.sort.out: No such file or directory cut: DNA5.AA.hmmscan.sort.out: No such file or directory FINISHED PRUNING CONTIGS WITH AT LEAST 1 VIRAL DOMAIN(S) Grabbing ORFs wihout RPS-BLAST hits and separating them into individual files for HHsearch time update: running HHsearch or HHblits 11-16-21---10:25:43 Combining tbl files from all search results AND fix overlapping ORF module No ITR contigs with minimum hallmark genes found. Annotating linear contigs time update: running BLASTX, annotate linear contigs 11-16-21---10:25:43 time update: running PHANOTATE, annotate linear contigs 11-16-21---10:26:18 time update: running Prodigal, annotate linear contigs 11-16-21---10:26:23 time update: running hmmscan1, annotating linear contigs 11-16-21---10:26:24 time update: running hmmscan2, annotating linear contigs 11-16-21---10:26:25 time update: running RPSBLAST, annotating linear contigs 11-16-21---10:26:27 /mnt/data/share/software/Cenote-Taker2/test/DNA/no_end_contigs_with_viral_domain/COMBINED_RESULTS.rotate.AA.rpsblast.out time update: running tRNAscan-SE 11-16-21---10:26:28 Grabbing ORFs wihout RPS-BLAST hits and separating them into individual files for HHsearch time update: running HHsearch or HHblits 11-16-21---10:26:31 Combining tbl files from all search results AND fix overlapping ORF module, linear contigs finalizing taxonomy for linear contigs DNA2_vs01 is a CRESS virus of some kind No suitable ORF for taxonomy found for DNA5_vs01, using BLASTX result. time update: finished annotating linear contigs 11-16-21---10:26:51 time update: running tbl2asn 11-16-21---10:26:53 [real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Flatfile DNA1_vs01

[real-tbl2asn] Validating DNA1_vs01

[real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Flatfile DNA2_vs01

[real-tbl2asn] Validating DNA2_vs01

[real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Flatfile DNA3_vs01

[real-tbl2asn] Validating DNA3_vs01

[real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Flatfile DNA4_vs01

[real-tbl2asn] Validating DNA4_vs01

[real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Replaced ' ' with '#' [real-tbl2asn] Flatfile DNA5_vs01

[real-tbl2asn] Validating DNA5_vs01

[tbl2asn-forever] WARNING: .gbf|.sqn files have incorrect date (01-JAN-2019) and will need to be corrected. Making gtf tables from final feature tables

time update: Finishing 11-16-21---10:26:55 Virus prediction summary: 5 virus contigs were detected/predicted. 0 contigs had DTRs/circularity. 0 contigs had ITRs. 5 were linear/had no end features Prophage pruning summary: 1 linear contigs > 10 kb were run through pruning module, and 0 virus sub-contigs (putative prophages/proviruses) were extracted from these. 1 virus contigs were kept intact. removing ancillary files output directory: DNA

CENOTE-TAKER 2 HAS FINISHED TAKING CENOTES<<<<<< /mnt/data/share/software/Cenote-Taker2 prodigal found BWA found samtools found mummer found circlator found blastp found blastn found blastx found rpsblast found bioawk found efetch found ktClassifyBLAST found hmmscan found bowtie2 found tRNAscan-SE found pileup.sh found tbl2asn found getorf found transeq found bedtools found