sourmash-bio / sourmash

Quickly search, compare, and analyze genomic and metagenomic data sets.
http://sourmash.readthedocs.io/en/latest/
Other
473 stars 80 forks source link

indicate threshold in `sourmash gather` output #1850

Open ctb opened 2 years ago

ctb commented 2 years ago

'twould be nice to have the threshold at which matches were generated output from sourmash gather.

https://github.com/sourmash-bio/sourmash/issues/1818 also involves adjusting sourmash gather output, so we could do both at one time.

bluegenes commented 2 years ago

This info would be great to have!

How would we handle the case where prefetch is run ahead of time with a certain threshold? This threshold affects the gather output but may not be directly passed as an argument.

I suppose if we add the threshold to the prefetch csv as well, and the prefetch csv is passed in as a prefetch-type picklist, this might be straightforward?

ctb commented 2 years ago

so I'm not actually sure what I meant by this issue - it is reported in the stdout already:

found less than 50.0 kbp in common. => exiting

I don't think it makes sense to report it in the output CSV.

in re

How would we handle the case where prefetch is run ahead of time with a certain threshold? This threshold affects the gather output but may not be directly passed as an argument.

I suppose if we add the threshold to the prefetch csv as well, and the prefetch csv is passed in as a prefetch-type picklist, this might be straightforward?

good points, and seems kind of ugly. I think sticking with "run complicated sourmash things in a workflow so you can track provenance that way" is the best advice ;).