sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

add a 'name' config to `sourmash sketch` param strings? #1315

Open ctb opened 3 years ago

ctb commented 3 years ago

we could allow sourmash sketch to take name= in param strings, e.g.

sourmash sketch dna -p k=31,name='cool name, luke'

rationale: when writing up the docs for sourmash sketch per https://github.com/dib-lab/sourmash/pull/1283#pullrequestreview-586095844, I realized that I had done signature naming the way I had because of limitations imposed by sourmash compute: to whit, that we could only specify one name on the command line for all the signatures being created.

However, with sourmash sketch, we create different signatures for each param string.

In a major scope expansion of this issue, we could also allow template variables like {header} and {len} to be used, to be interpreted by Python for each sequence...

ctb commented 3 years ago

it would also be nice to support explicit naming from filename, and/or basename, and/or maybe even accession from a CSV of some sort.

bluegenes commented 3 years ago

I REALLY like adding a name option in the param string!

template vars seem very handy, but maybe also dangerous?

name from csv is what I end up doing via snakemake, so doing it natively would be neat :)

ctb commented 3 years ago

also, see @taylorreiter comment in https://github.com/dib-lab/sourmash/pull/1283/files#r572495952 -

docs say:

You can also stream any of these formats into sourmash sketch via stdin by using - as the input filename.

@taylorreiter -

Yes, that's true, but then the name of the sig is recorded as - which is really confusing when you compare a bunch of files.

Also, should there be an example for how to do this?

ctb commented 3 years ago

also, see @taylorreiter comment in https://github.com/dib-lab/sourmash/pull/1283/files#r572495952 -

docs say:

You can also stream any of these formats into sourmash sketch via stdin by using - as the input filename.

@taylorreiter -

Yes, that's true, but then the name of the sig is recorded as - which is really confusing when you compare a bunch of files.

fixed in #1347 - name/filename is now empty.

Also, should there be an example for how to do this?

added in 2ac0b967!