Open echo-saurav opened 1 month ago
This is definitely planned and I actually have a prototype for it already :) Just trying to find a reasonable vector db without adding extra dependencies.
awsome ! You can see weaviate . it has text and image vector both (I know its not really lightweight , but i like it a lot because of its customisation ability )
You can actually do vector search in postgres pretty easily. Postgres would also have the side benefit of not requiring the database to be mounted into every container.
The problem is that hoarder currently doesn't depend on postgres. So introducing postgres now as a dependency will be very disruptive. If I'm to start hoarder from scratch, I'd have gone for postgres for everything (database, FTS, vector search, etc). But it's too late now unfortunately.
Ahh okay, I'm not familiar with Drizzle but perusing the docs made it look like a fairly simply drop in replacement.
it's less about the code changes and more about asking every existing user to add a new dependency and migrate their data.
I actually REALLY like the idea of moving to postgres.
Yes, I understand that it is disruptive, but there is already a section in the release notes on what to keep an eye on and if we update the UI to show that you need to add new environment variables with a postgres db and we offer an automatic data migration (could be version 0.16.0 with not much else), we could keep the disruption low AND open up a whole lot of possibilities for us in the future.
- Is is a toy db anyways.
I pretty much disagree that sqlite is a toy database. Cloudflare's D1 database for example is built on top of sqlite. Other companies like fly.io and turso are also offering prod databases built on top of sqlite. Tailscale for example, aslo embraced sqlite in prod. We're way way far from approaching the limits of sqlite. It also fits us well because we don't need a client/server architecture given that our deployments are usually on a single machine.
We can get rid of Meilisearch:
Sqlite contains full text search btw (https://www.sqlite.org/fts5.html) and the extension is already enabled in our docker containers. I haven't given it a try so I don't know how good it is compared to meillisearch's. I also didn't give postgres' FTS a try as well. So if getting rid of meillisearch is a goal, there's a route to do it on sqlite as well.
Two limitations that I know about in sqlite's FTS (that I don't know if pg handles better):
It is currently juggling a lot of stuff around in memory
This can be solved if we're to move to sqlite's FTS.
For the database you no longer need to have the same location mounted on both apps
I've been actually thinking about going the route that immich went. Just merge the workers and web containers into one. That'll simplify the deployment a bit without sacrificing on anything. I initially went with separate container for the worker as the worker was the one spawning the chrome process and I didn't want this to be mixed with the web container. But now chrome is in its own container and we can probably just spin up the workers as a background job inside the web container.
Yes, I understand that it is disruptive
This "IS" my biggest concern. It is very disruptive and we will lose some users because of that move. I'm for example, still stuck on old immich releases because I don't have time to go through all the recent breaking changes that they introduced. I want hoarder to just work for people, and regardless of how many bells we add to the UI, we're going to break some deployment with this migration, and I don't really want this to happen.
I understand that sometimes this is a cost we'll have to pay, but so far, I'm not seeing the strong justification to pay it just yet.
There's another route we can take though. We can double down on sqlite:
Automatically getting tags with ollama is quite nice ! but i think it would be more awesome if it stored text vector , so we can search by similar text, or make filter to have similar links / image together