immich-app / immich

High performance self-hosted photo and video management solution.
https://immich.app
GNU Affero General Public License v3.0
51.91k stars 2.75k forks source link

Server: Storage layer refactor #1011

Open bo0tzz opened 1 year ago

bo0tzz commented 1 year ago

Feature detail

At the moment, Immich does not have an abstracted storage layer. On upload, files are stored in the semi-hardcoded library path with a randomly generated filename, their path is stored in the database, and in any future (read) operations this stored path is used (file serving, thumbnail generation, etc).
For several of the features we're meaning to (potentially) implement in the future (eg #34, #418, #451), it will be very helpful to refactor and abstract the storage layer. For some of them, like supporting multiple storage backends, it will be necessary entirely. In this issue I want to propose a design, although it will need some more discussion and refinement before it will be complete.

As mentioned above, currently the storage path for a file is generated once and stored in the database. I propose that we instead move to a model where storage paths are built on the fly based on the data we have for an asset. We already use some of that data to build the path on upload right now:

const originalUploadFolder = join(basePath, req.user.id, 'original', sanitizedDeviceId);

Instead, when trying to write or read an asset, the storage layer would expose a function for that which accepts the AssetEntity (or a more limited set of data, if desired). The storage implementation then uses that internally, together with some configuration, to build the actual path. That way, things like the storage path become an implementation detail that does not need to be exposed to the rest of Immich.

I think it would be good to keep the storage providers as self-contained as we can, and avoid having it do things like access the database. Instead, it would take in a configuration when initializing (eg, the root path where to store files, S3 access credentials, or a template for the filename). That configuration can of course be read from the database by whatever code initializes the provider.

This will allow for a multitude of nice things:

tbd:

  1. What interface does a storage provider need? Probably at least create, delete and stat. How about something like S3, which might be able to provide URLs for direct access (bypassing immich)?
  2. The description here (partly) covers multiple features. What is the exact scope of the initial refactor (and how do we anticipate those future features in it)?
  3. ???

Platform

Server

Cellivar commented 1 year ago

If I might be so bold as to add some unsolicited advice..

You're very close to the concept of a generic blob storage interface, where filesystem storage just a slightly weird looking blob storage API. Blob storage for large files, like images, is a very common design pattern for modern networked systems. Though my link hasn't been updated for a few years it's a good example of the way you may want to head in with your implementation. I suspect you can find more modern options for TypeScript out there, searching from my phone is difficult.

Your abstraction layer you described becomes blob operations, which then translate into actual blob API calls (filesystem write, s3 write, NFS share write, etc).

bo0tzz commented 1 year ago

That's very helpful, thank you!

pinpox commented 1 year ago

Does immich support S3 Storage currently? I saw a related pr was merged some time ago, but can't figure out how to set it up.

jrasm91 commented 1 year ago

This is probably what you are thinking of: https://github.com/immich-app/immich/discussions/1683#discussioncomment-6206105

ibotty commented 2 months ago

I just wanted to point to Apache OpenDAL which is used in the big data ecosystem quiet a bit. It is a unified storage layer supporting many different storage systems, among it s3 and local posix file systems.

It also has node bindings.

https://opendal.apache.org/docs/nodejs/