refresh-bio / kmer-db

Kmer-db is a fast and memory-efficient tool for large-scale k-mer analyses (indexing, querying, estimating evolutionary relationships, etc.).
GNU General Public License v3.0
81 stars 16 forks source link

kmer-db build failed #4

Closed royfrancis closed 5 years ago

royfrancis commented 5 years ago

I have two samples NC_000913.3.fasta.gz and NC_002655.2.fasta.gz. My samples.txt looks like

NC_000913.3
NC_002655.2

I am running the following command in the same directory as the samples:

kmer-db-1.13 build samples.txt kmdb

Kmer-db version 1.13 (28.11.2018)
S. Deorowicz, A. Gudys, M. Dlugosz, M. Kokot, and A. Danek (c) 2018

Database building mode (fasta genomes)
Processing samples...
failed:NC_000913.3
failed:NC_002655.2

EXECUTION TIMES
Total: 0.000994199
Loading k-mers: 0.000991275
Processing time: 2.122e-314
    Hashatable resizing (serial): 0
    Hashtable searching (parallel): 0
    Hashatable insertion (serial): 0
    Sort time (parallel): 0
    Pattern extension time (serial): 0

STATISTICS
Number of samples: 0
Number of patterns: 1
Number of k-mers: 0
K-mer length: 0
Serializing database...OK (0.781234 seconds)
Releasing memory...OK (0.000965716 seconds)

Processing samples failed. Is this .fna format anything special or is it just .fasta/.fasta.gz files?

agudys commented 5 years ago

Hello! .fna format is basically FASTA - please replace extensions of your sample files to .fna.gz and everything should work properly. In the next release we will add checking for the presence of .fasta/.fasta.gz files as well.

royfrancis commented 5 years ago

That worked. Thanks! I am not sure why you want to hard code a file extension. Isn't it easier to let users provide a full name to a FASTA file regardless of the extension. You could probably do an internal sanity check if the file ends in .fa, .fasta, .fna or ends in .gz.

agudys commented 5 years ago

Hardcoded extensions are because different kmer-db modes require different file formats and we wanted samples files to be portable between modes.