Closed sunnyosun closed 1 week ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.63%. Comparing base (
57fbd29
) to head (2ee90cb
). Report is 29 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚀 Deployed on https://672d2c122dca6b5b514d2906--lamindb-qnwk.netlify.app
This single example looks fantastic!
I'm just worried that it will deteriorate other cases.
Can you run this against @Koncopd's benchmarking framework?
And then we have a before after comparison that includes a wider array of search cases and a report that underlies the decision published to LaminHub.
Updated screenshots with the examples @Koncopd had, let me know if there's anything else I can test.
@sunnyosun @falexwolf here is the benchmark for this PR https://github.com/laminlabs/lamindb/pull/2141 https://672cb784f7316d7437f2f69d--lamindb-qnwk.netlify.app/faq/benchmark-search
These are great improvements!
I wish there was good way in laminhub to document such changes but I guess there isn't for now.
@fredericenard, please also take a look here.
And then we'll discuss in the benchmarking PR how we proceed with organizing code across lamindb, bionty and laminhub.
Added more searches to the benchmark.
test_search_synonyms
fails now, commented out in the benchmark.
Fixed synonyms. The only thing is that it's getting a bit slower now 0.7-0.8s per search (before 0.2-0.3s) because of all the different layers. (us-west-2 instance)
Ohh and can we get rid of this now? https://docs.lamin.ai/query-search 🚀
Yes, removing will be the goal.
@Koncopd is making a push to consolidate all 3 search algorithms into one clearly documented and benchmarked solution.
Sunny's PR here has good ideas but it's not suitable for the hub due to performance. So, the hope is Sergei can replicate the UX with server-side code (essentially correctly using postgres plugins). We can still use Sunny's code for dataframes and sqlite; it'll give the same results but just run slower which is OK in that context.
Key improvements:
startswith
and isolated phrases (e.g. "naive B cell", "B cell, ..." over "club cell" when searching "b cell")Note:
centrocyte
appears in the "b cell" search because there's a perfect match of "B cell" in the description, same for "t cell" results.