Closed l-zheng24 closed 1 month ago
I've made a short write-up analyzing our options.
These two options are our main options for a full-scale search engine/database, where we will store full-text of our pulled data. This allows for a more full search integration with SCOPE, rather than searching tags, URL, or title.
Typesense and Meilisearch are both fully open source. The code license has ramifications on our ability to embed it within SCOPE, which plans on being a single shippable container. Since both are able to be embedded within the container due to their license, either is acceptable.
Typesense supports multi-tenant. However, since we don't plan on implementing this architecture, this is irrelevant :p
Typesense supports this whereas Meilisearch does not. We desire this functionality, so we can weight at a higher accuracy based on the query.
Winner: Typesense
Typesense supports this whereas Meilisearch does not. This functionality is a nice-to-have.
Typesense supports this whereas Meilisearch does not. We desire this functionality, so we can weight queries lower, exclude certain text, and find similar query results
Typesense wins again - has it whereas Meilisearch does not.
Ultimately the winner here is Typesense. I'm going to work on a write up describing how we can add to its database and properly integrate with it .
Meilisearch just hit version 1.0, it's a search/ranking library that would allow us to insert documents into it and search. It's also very low-resources, has a free hosted tier (but can easily be self-hosted), and has plugins to integrate with a web framework so that a search bar can be more easily added to the front-end.
A tentative workflow we could have: Add every document (metadata + fulltext, presumably we could derive a ranking from the metadata based off of trustworthiness) that are the results of any existing queries into the search database, then serve searches. This can be embedded into a Docker container.
Another alternative is Typesense. They provide query-time ranking options which is good because we need to handle insert-time ranking and query-time ranking.
It actually looks like Typesense is the way to go - Meilisearch doesn't scale past 100k documents very well. The rustacean in me is sad but alas.