hoarder-app / hoarder

A self-hostable bookmark-everything app (links, notes and images) with AI-based automatic tagging and full text search
https://hoarder.app
GNU Affero General Public License v3.0
3.56k stars 133 forks source link

[Feature request] Selfhosted semantic search #441

Open JojiiOfficial opened 1 day ago

JojiiOfficial commented 1 day ago

I'd love to see Sentence Transformers getting added into Hoarder for enhanced semantic search capabilities. It could make finding bookmarks much more efficient and user-friendly.

For reference, you can check out this example that illustrates how they could be applied in Hoarder.

Their lightweight nature also aligns perfectly with self-hosting and privacy goals. Additionally, using a vector database like Qdrant could help storing and retrieving the generated embeddings efficiently, providing fast performance and easy self-hosting.

As I'm very familiar with this area of application, I'd consider contributing, if this gets accepted and we agree on an implementation.

MohamedBassem commented 1 day ago

@JojiiOfficial We already have @medo who's working on adding RAG on the stored bookmarks. The first PR is here (https://github.com/hoarder-app/hoarder/pull/403/files) (currently pending review) which generates embeddings for the data stored in hoarder. For vector database, we're considering either sqlite-vec or orama (https://github.com/askorama/orama). Orama is cool because we can also use it for FTS (as a replacement for meilisearch). If you're interested in contributing to this effort, please join us in the #development channel on discord.

JojiiOfficial commented 1 day ago

Thanks for the quick response!

I'd love to be able to fully selfhost Hoarder. I personally don't want all my bookmarks being sent to the OpenAI servers but prefer keeping everything local. I think a lot of people, especially people selfhosting their apps, think similarly. For this, either some local LLM or specialized models, like Sentence Transformer seem to be the best choice.

What are your thoughts on this?

MohamedBassem commented 1 day ago

@JojiiOfficial Hoarder already supports ollama for local inference. This feature is going to be no different (will work with either ollama or open ai).

JojiiOfficial commented 1 day ago

I didn't notice the configuration option for Ollama. Thanks for the clarification!