sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

Force sig file extension to be `sig` or `sig.gz` #1932

Open mr-eyes opened 2 years ago

mr-eyes commented 2 years ago

Since the user has the power to sig rename or generates signatures without .sig or .sig.gz extension. And since the zipStorage implementation expects .sig or .sig.gz extension as per https://github.com/sourmash-bio/sourmash/blob/7b55c522e4bf7d049396a6af7751436fa02860ba/src/sourmash/index/__init__.py#L623-L624

Enforcing/unifying signature extensions to be .sig or .sig.gz can reduce some errors.

mr-eyes commented 2 years ago

Maybe I am getting it wrong? due to the traverse_yield_all after that? @luizirber @ctb

ctb commented 2 years ago

Hi @mr-eyes I don't understand -

All of the internal code reads and writes .sig/.sig.gz file names by default and by convention. The user can specify a different file extension if they want and it will be read appropriately, but only .sig and .sig.gz files will be auto-discovered on traversal unless -f is specified.

So, for example, if a directory has a bunch of .sig.gz files under it, and you point sourmash at the directory it will load all those files and no more, unless you specify -f. Same with a zip file.

ctb commented 2 years ago

oh! perhaps you mean we should require it all the time?

I don't recall why I thought it was a good idea Back When, and it causes some interesting problems and weird behavior (see comments about 'traverse_all' in https://github.com/sourmash-bio/sourmash/issues/1849). But it isn't our convention and I don't think it causes too many problems in the code. I don't have any real problem with changing it but it would have to be part of a major version bump (at least 1, if not 2) because of semantic versioning. I'm tempted to say that we should just leave it.

mr-eyes commented 2 years ago

oh! perhaps you mean we should require it all the time?

Yes exactly!

I don't have any real problem with changing it but it would have to be part of a major version bump (at least 1, if not 2) because of semantic versioning. I'm tempted to say that we should just leave it.

Ok then, I labeled it to the next major version for now :)