sourmash-bio / sourmash_plugin_branchwater

fast, multithreaded sourmash operations: search, compare, and gather.
GNU Affero General Public License v3.0
15 stars 2 forks source link

how are `manysearch` results ordered, and can we threshold num results returned? #257

Open bluegenes opened 7 months ago

bluegenes commented 7 months ago

Had a question on whether manysearch results are ordered by best hit, and whether we could add a threshold parameter to return only the top n results.

I think:

We would need sorted results in order to implement a threshold number of hits to return.

The was brought up in the context of speeding up search and downstream processing. Since we need to check all database entries in order to build a sorted list, I think any potential benefit would be small -- would only reduce writing (fewer results to write) and very slightly speed up downstream processing (fewer results to read)?

ctb commented 7 months ago

this way lies madness.

strong opinion: since it's (mostly) not computationally challenging to load the results in after, leave it as-is and have all limits on number of results applied AFTER.

(old bad design decisions => let's avoid that mess in the future 😆 )