Open bluegenes opened 7 months ago
this way lies madness.
strong opinion: since it's (mostly) not computationally challenging to load the results in after, leave it as-is and have all limits on number of results applied AFTER.
(old bad design decisions => let's avoid that mess in the future 😆 )
Had a question on whether
manysearch
results are ordered by best hit, and whether we could add a threshold parameter to return only the top n results.I think:
manysearch
are ordered by the order in the database, not sorted by best hit.manysearch
usesdb.counter_for_query
, which I think returns the best hits first? https://github.com/sourmash-bio/sourmash/blob/latest/src/core/src/index/revindex/disk_revindex.rs#L284C7-L301 counter.most_common
implies to me that we get sorted results...We would need sorted results in order to implement a threshold number of hits to return.
The was brought up in the context of speeding up search and downstream processing. Since we need to check all database entries in order to build a sorted list, I think any potential benefit would be small -- would only reduce writing (fewer results to write) and very slightly speed up downstream processing (fewer results to read)?