onecodex / finch-rs

A genomic minhashing implementation in Rust
https://www.onecodex.com
MIT License
92 stars 8 forks source link

finch dist fails on non-suffixed sketches #10

Open HadrienG opened 6 years ago

HadrienG commented 6 years ago

Hi,

finch dist, finch hist and finch info do not work on sketch files that do not end with .sk

With suffix:

finch sketch ecoli*
finch dist ecoli*.sk
#  [{"containment":0.993,"jaccard":0.9860973187686196,"mashDistance":0.00033450547318878413,"commonHashes":993,"totalHashes":1000,"query":"ecoli_ref-5m-trim.pe.fq.gz","reference":"ecoliMG1655.fa.gz"}]

Without suffix:

finch sketch ecoli_ref-5m-trim.pe.fq.gz -o ecoli_ref_no_prefix
finch sketch ecoliMG1655.fa.gz -o ecoliMG_no_suffix
finch dist *no_suffix
# Error: Bad starting byte

Best, Hadrien


Review ref for tracking

bovee commented 6 years ago

@HadrienG This is not a great usability experience; thanks for finding it! It would be nice to auto-detect *.sk files before trying to interpret as a FASTX, but it's a tricky to do nicely this in the current code base. We definitely should do at some point though; I'll leave this issue open to track that (or I can make a new issue for that if you need to close these out for your review process).

As a related short-term fix, I just added a commit (f5a98580628ff53cbbc7da5c40f1d4e0b0812e9c) to automatically add the right extension when you pass in a filename without one to -o.