nekokoe / Plasmer

An accurate and sensitive bacterial plasmid identification tool based on deep machine-learning of shared k-mers and genomic features.
MIT License
37 stars 3 forks source link

failed: output.plasmer.length.unclass.fasta, Error: no input sequences to analyze #26

Open tianrenmaogithub opened 4 months ago

tianrenmaogithub commented 4 months ago

Thank you for developing the tool. I am having an issue running it. Below is the command I ran with docker.

docker run -d -v /media/Data_1/tianrm/projects/plasmid_identification/4/testing/plasmidome:/input -v /media/Data_1/tianrm/projects/plasmid_identification/4/testing/benchmarking/Plasmer:/output -v /media/Data_1/tianrm/databases/plasmer:/db nekokoe/plasmer:latest /bin/sh /scripts/Plasmer -g /input/plasmidome_1k-3k -d /db -t 8 -o /output/plasmidome_1k-3k

The input has 1000 contigs (1 - 3 Kbp). Below is a list of files in the output directory. They are empty files.

-rw-r--r-- 1 root root 0 May 21 16:11 output.plasmer.predClass.tsv -rw-r--r-- 1 root root 0 May 21 16:11 output.plasmer.predPlasmids.fa -rw-r--r-- 1 root root 0 May 21 16:11 output.plasmer.predPlasmids.taxon -rw-r--r-- 1 root root 0 May 21 16:09 output.plasmer.shorterM.fasta

Below is a list of the intermediate files.

-rw-r--r-- 1 root root 0 May 21 16:11 output.aa -rw-r--r-- 1 root root 221 May 21 16:11 output_blastn.sh -rw-r--r-- 1 root root 57 May 21 16:11 output.conjugation.domtblout.feature -rw-r--r-- 1 root root 225 May 21 16:11 output_diamond.sh -rw-r--r-- 1 root root 0 May 21 16:11 output.gff -rw-r--r-- 1 root root 1.3K May 21 16:11 output_hmmscan.sh -rw-r--r-- 1 root root 57 May 21 16:11 output.mobilization.domtblout.feature -rw-r--r-- 1 root root 33 May 21 16:11 output.mps.rds.feature -rw-r--r-- 1 root root 57 May 21 16:11 output.ncbifam-amr.domtblout.feature -rw-r--r-- 1 root root 0 May 21 16:11 output.orit.blastn.out -rw-r--r-- 1 root root 39 May 21 16:11 output.orit.feature -rw-r--r-- 1 root root 97 May 21 16:10 output.p.common_table -rw-r--r-- 1 root root 0 May 21 16:11 output.p.common_table.features -rw-r--r-- 1 root root 0 May 21 16:09 output.plasmer.length.class -rw-r--r-- 1 root root 0 May 21 16:09 output.plasmer.length.unclass.fasta -rw-r--r-- 1 root root 0 May 21 16:11 output.plasmer.predPlasmids.k2.report -rw-r--r-- 1 root root 0 May 21 16:09 output.plasmer.shorterM.class -rw-r--r-- 1 root root 122 May 21 16:10 output.pmr.common_table -rw-r--r-- 1 root root 0 May 21 16:11 output.pmr.common_table.features -rw-r--r-- 1 root root 112 May 21 16:11 output.r.common_table -rw-r--r-- 1 root root 0 May 21 16:11 output.r.common_table.features -rw-r--r-- 1 root root 57 May 21 16:11 output.replication.domtblout.feature -rw-r--r-- 1 root root 124 May 21 16:10 output.rmp.common_table -rw-r--r-- 1 root root 0 May 21 16:11 output.rmp.common_table.features -rw-r--r-- 1 root root 39 May 21 16:11 output.rRNA.feature -rw-r--r-- 1 root root 74 May 21 16:09 output.sample.list -rw-r--r-- 1 root root 93K May 21 16:09 output.seqkit

The output.seqkit showed the input sequences, like this:

816f8014-0726-4b8e-8848-f12a74caa1fa_utg000001l_pilon_pilon_pilon_pilon_pilon 2087 51.22 7.58 c7dbf8f4-58c3-427b-ae75-f61c4d2c3ff3_utg000001l_pilon_pilon_pilon_pilon_pilon 2596 55.86 2.21 86d53998-8e37-496e-849a-47b39de7025c_utg000001c_pilon_pilon_pilon_pilon_pilon 1440 40.42 -3.09 73dc658b-2172-433b-873a-e2fc7fcf876b_utg000001l_pilon_pilon_pilon_pilon_pilon 1845 60.11 1.71 d8261810-2e07-4546-8cb0-722d9b2f53bf_utg000001c_pilon_pilon_pilon_pilon_pilon 1064 68.61 3.84 5c3d0fe6-b48a-4f62-962b-a32b40e6aef2_utg000001c_pilon_pilon_pilon_pilon_pilon 1643 36.82 4.79 8d402ee4-324e-4bcd-8f10-2ce69d621885_utg000001c_pilon_pilon_pilon_pilon_pilon 1501 42.04 4.60

but the output.plasmer.length.unclass.fasta and many other files are empty. I also installed with conda to run it and got the same outputs. Below is the screen output.

Usage: Plasmer -h|--help -g|--genome -v|--version -p|--prefix -d|--db -t|--threads -m|minimum_length -l|--length -o|--outpath Checking sequence length... Generating k-mer features... Kmer-db version 1.11.1 (07.03.2023) S. Deorowicz, A. Gudys, M. Dlugosz, M. Kokot, and A. Danek (c) 2018

Analysis started at Wed May 22 05:09:59 2024

Set of new samples (from fasta genomes) versus entire database comparison Loading k-mer database /db/plsdb_minus_ncbi_representative.k25.kmer-db... Loading k-mer hashtables (raw)... (plasmer) [tianrm@Sabalan 4]$ docker logs -f 94f7e21894141e3c66524f6f903f91c08f484c68d9008e89cd0bbf3f9a512257 Usage: Plasmer -h|--help -g|--genome -v|--version -p|--prefix -d|--db -t|--threads -m|minimum_length -l|--length -o|--outpath Checking sequence length... Generating k-mer features... Kmer-db version 1.11.1 (07.03.2023) S. Deorowicz, A. Gudys, M. Dlugosz, M. Kokot, and A. Danek (c) 2018

Analysis started at Wed May 22 05:09:59 2024

Set of new samples (from fasta genomes) versus entire database comparison Loading k-mer database /db/plsdb_minus_ncbi_representative.k25.kmer-db... Loading k-mer hashtables (raw)... 262144/262144 hashtables loaded in 20.4286 s Loading patterns... 2/2 patterns loaded in 1.105e-05 s OK (21.1952 seconds) Number of samples: 1 Number of patterns: 2 (0 B) Number of k-mers: 967,521,022 K-mer length: 25 Minhash fraction: 1 Workers count: 8

Processing queries... failed:/output/plasmidome_1k-3k/intermediate/output.plasmer.length.unclass.fasta

EXECUTION TIMES Total: 0.00104653

Analysis finished at Wed May 22 05:10:22 2024

Kmer-db version 1.11.1 (07.03.2023) S. Deorowicz, A. Gudys, M. Dlugosz, M. Kokot, and A. Danek (c) 2018

Analysis started at Wed May 22 05:10:23 2024

Set of new samples (from fasta genomes) versus entire database comparison Loading k-mer database /db/plsdb.k25.kmer-db... Loading k-mer hashtables (raw)... 262144/262144 hashtables loaded in 21.8257 s Loading patterns... 2/2 patterns loaded in 3.69e-06 s OK (22.6056 seconds) Number of samples: 1 Number of patterns: 2 (0 B) Number of k-mers: 1,033,690,553 K-mer length: 25 Minhash fraction: 1 Workers count: 8

Processing queries... failed:/output/plasmidome_1k-3k/intermediate/output.plasmer.length.unclass.fasta

EXECUTION TIMES Total: 0.00139297

Analysis finished at Wed May 22 05:10:47 2024

Kmer-db version 1.11.1 (07.03.2023) S. Deorowicz, A. Gudys, M. Dlugosz, M. Kokot, and A. Danek (c) 2018

Analysis started at Wed May 22 05:10:48 2024

Set of new samples (from fasta genomes) versus entire database comparison Loading k-mer database /db/ncbi_representative_minus_plsdb.k18.f0.1.kmer-db... Loading k-mer hashtables (raw)... 256/256 hashtables loaded in 9.69607 s Loading patterns... 2/2 patterns loaded in 5.93e-06 s OK (10.4482 seconds) Number of samples: 1 Number of patterns: 2 (0 B) Number of k-mers: 615,903,503 K-mer length: 18 Minhash fraction: 0.1 Workers count: 8

Processing queries... failed:/output/plasmidome_1k-3k/intermediate/output.plasmer.length.unclass.fasta

EXECUTION TIMES Total: 0.00145148

Analysis finished at Wed May 22 05:11:00 2024

Kmer-db version 1.11.1 (07.03.2023) S. Deorowicz, A. Gudys, M. Dlugosz, M. Kokot, and A. Danek (c) 2018

Analysis started at Wed May 22 05:11:00 2024

Set of new samples (from fasta genomes) versus entire database comparison Loading k-mer database /db/ncbi_representative.k18.f0.1.kmer-db... Loading k-mer hashtables (raw)... 256/256 hashtables loaded in 10.3467 s Loading patterns... 2/2 patterns loaded in 7.24e-06 s OK (11.0935 seconds) Number of samples: 1 Number of patterns: 2 (0 B) Number of k-mers: 660,014,131 K-mer length: 18 Minhash fraction: 0.1 Workers count: 8

Processing queries... failed:/output/plasmidome_1k-3k/intermediate/output.plasmer.length.unclass.fasta

EXECUTION TIMES Total: 0.00122173

Analysis finished at Wed May 22 05:11:12 2024

Predicting gene with Prodigal...

PRODIGAL v2.6.3 [February, 2016] Univ of Tenn / Oak Ridge National Lab Doug Hyatt, Loren Hauser, et al.

Request: Metagenomic, Phase: Training Initializing training files...done!

Request: Metagenomic, Phase: Gene Finding

Error: no input sequences to analyze.

Searching with BLASTN... Warning: [blastn] Examining 5 or more matches is recommended Warning: [blastn] Query is Empty! Searching with DIAMOND... diamond v2.0.8.146 (C) Max Planck Society for the Advancement of Science Documentation, support and updates available at http://www.diamondsearch.org

CPU threads: 8

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Temporary directory: /output/plasmidome_1k-3k/intermediate

Target sequences to report alignments for: 1

Opening the database... [0.11s] Database: /db/platon_db/mps.dmnd (type: Diamond database, sequences: 4847438, letters: 1549533412) Block size = 2000000000 Opening the input file... [0s] Error: Error detecting input file format. First line seems to be blank. Searching with hmmsearch...

Error: Sequence file /output/plasmidome_1k-3k/intermediate/output.aa is empty or misformatted

Error: Sequence file /output/plasmidome_1k-3k/intermediate/output.aa is empty or misformatted

Error: Sequence file /output/plasmidome_1k-3k/intermediate/output.aa is empty or misformatted

Error: Sequence file /output/plasmidome_1k-3k/intermediate/output.aa is empty or misformatted

Error: Sequence file /output/plasmidome_1k-3k/intermediate/output.plasmer.length.unclass.fasta is empty or misformatted

Generating genomic features... Merging features... Error in read.table(chromosomek18, sep = "\t") : no lines available in input Execution halted Predicting... randomForest 4.7-1.1 Type rfNews() to see new features/changes/bug fixes. Error in file(file, "rt") : cannot open the connection Calls: read.table -> file In addition: Warning message: In file(file, "rt") : cannot open file '/output/plasmidome_1k-3k/intermediate/output.allFeatures': No such file or directory Execution halted cat: /output/plasmidome_1k-3k/intermediate/output.allFeatures.plasmer.predClass.tsv: No such file or directory mv: cannot stat '/output/plasmidome_1k-3k/intermediate/output.allFeatures.plasmer.predProb.tsv': No such file or directory Predicting finished! See your result in /output/plasmidome_1k-3k/results/ Classifying taxonomy... Loading database information... done. 0 sequences (0.00 Mbp) processed in 0.002s (0.0 Kseq/m, 0.00 Mbp/m). 0 sequences classified (-nan%) 0 sequences unclassified (-nan%) cut: /output/plasmidome_1k-3k/intermediate/output.plasmer.predPlasmids.k2.out: No such file or directory Plasmid taxonomy finished! See your result in /output/plasmidome_1k-3k/results/

tianrenmaogithub commented 4 months ago

We are using Fedora OS and AMD EPYC 7551 32-Core Processor.