refresh-bio / kmer-db

Kmer-db is a fast and memory-efficient tool for large-scale k-mer analyses (indexing, querying, estimating evolutionary relationships, etc.).
GNU General Public License v3.0
83 stars 17 forks source link

Use full paths as sample names in the database #7

Closed agudys closed 3 years ago

agudys commented 5 years ago

Currently, only a filename (without a full path) is used internally in the database as a sample name. This may be a problem when samples differ by paths, not the filenames.

rmostowy commented 5 years ago

It's actually more than this – when the filename is compressed, for example GCF_000505285.1.fa.gz, one has to remove only the last compression (i.e., provide GCF_000505285.1.fa). Sounds like it would be easy to change this?

agudys commented 3 years ago

When loading genome files, exact filenames are examined first. If this fails, an attempt to add predefined extensions is made.