sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
466 stars 79 forks source link

`sourmash sig split` output flag `--output` works but doesn't match help message #2436

Closed jessicalumian closed 1 year ago

jessicalumian commented 1 year ago

I just ran the command sourmash sig split <input> --output DATA/split_sigs_filtered/ without checking help documentation and it worked! However the help message only claims that --output-dir or --outdir will work.

== This is sourmash version 4.5.0. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

usage: 

### `sourmash signature split` - split signatures into individual files

Split each signature in the input file(s) into individual files, with
standardized names.

For example,

sourmash signature split tests/test-data/2.fa.sig

will create 3 files,

`f372e478.k=21.scaled=1000.DNA.dup=0.2.fa.sig`,
`f3a90d4e.k=31.scaled=1000.DNA.dup=0.2.fa.sig`, and
`43f3b48e.k=51.scaled=1000.DNA.dup=0.2.fa.sig`, representing the three
different DNA signatures at different ksizes created from the input file
`2.fa`.

The format of the names of the output files is standardized and stable
for major versions of sourmash: currently, they are period-separated
with fields:

* `md5sum` - a unique hash value based on the contents of the signature.
* `k=<ksize>` - k-mer size.
* `scaled=<scaled>` or `num=<num>` - scaled or num value for MinHash.
* `<moltype>` - the molecule type (DNA, protein, dayhoff, or hp)
* `dup=<n>` - a non-negative integer that prevents duplicate signatures from colliding.
* `basename` - basename of first input file used to create signature; if none provided, or stdin, this is `none`.

If `--outdir` is specified, all of the signatures are placed in outdir.

Note: `split` only saves files in the JSON `.sig` format.

split signature files

positional arguments:
  signatures

optional arguments:
  -h, --help            show this help message and exit
  -q, --quiet           suppress non-error output
  --output-dir OUTPUT_DIR, --outdir OUTPUT_DIR
                        output signatures to this directory
  -f, --force           try to load all files as signatures
  --from-file FROM_FILE
                        a text file containing a list of files to load signatures from
  -k K, --ksize K       k-mer size; default=31
  --protein             choose a protein signature; by default, a nucleotide signature is used
  --no-protein          do not choose a protein signature
  --dayhoff             choose Dayhoff-encoded amino acid signatures
  --no-dayhoff          do not choose Dayhoff-encoded amino acid signatures
  --hp, --hydrophobic-polar
                        choose hydrophobic-polar-encoded amino acid signatures
  --no-hp, --no-hydrophobic-polar
                        do not choose hydrophobic-polar-encoded amino acid signatures
  --dna, --rna, --nucleotide
                        choose a nucleotide signature (default: True)
  --no-dna, --no-rna, --no-nucleotide
                        do not choose a nucleotide signature
  --picklist PICKLIST   select signatures based on a picklist, i.e. 'file.csv:colname:coltype'
  --picklist-require-all
                        require that all picklist values be found or else fail

Maybe it would be worth adding --output as a flag for this option? Or you can keep secretly supporting it.

I also tried -o because I like to break things and that did not work.

ctb commented 1 year ago

Yep!

The argument parser we use, argparse, will (should?) match to any substring of a long-form argument that uniquely matches that argument - so --ou should work to match --output-dir, since there's no other long-form argument that starts with --ou.

ISTR not supporting -o explicitly because in most commands, -o specifies an output filename, not a directory.

ctb commented 1 year ago

closing! extra things working is fine in any case 😆