szabodanika / microbin

A secure, configurable file-sharing and URL shortening web app written in Rust.
https://microbin.eu
BSD 3-Clause "New" or "Revised" License
2.65k stars 163 forks source link

Persistency improvements #4

Closed szabodanika closed 1 year ago

szabodanika commented 2 years ago

Currently all pastas are stored in the memory. Can we save them on the disk in either a tiny database or simply in files? Storing everything in the memory could be bad if the service is running on a tiny VM for example with limited RAM, or if the pastas are important and the user wants to retain them even if the service is restarted.

szabodanika commented 2 years ago

This feature is now added

MasterGroosha commented 2 years ago

Hello! Thank you for your project!

I have two questions regarding "persistency" feature:

  1. Could you please add an option to set a path to "pasta_data" folder? This would especially be useful for Docker deployments
  2. Are there any plans for other backends? Currently all data is stored in JSON file which might not be cool for heavy usage. For example, data could be stored in SQLite database. Or even re-add memory backend for "fire and forget" deployments
szabodanika commented 2 years ago

Welcome @MasterGroosha, thanks for your suggestions! Upgrading the simple file + json storage is something that many have asked about already and I am now thinking about the best way to go about this. I think the best would be an option like --sqlite [path to sqlite db] that would enable an sqlite backend, and otherwise it would just keep using the default json format. Setting the path to pasta_data is definitely coming asap, until then you can customise where you are running the executable from.

MasterGroosha commented 2 years ago

I'm not very confident with my system engineering knowledge, but I think it's not a good idea to store files a BLOBs in SQLite. In my thoughts, there was something like:

├── microbin
├── data (folder)
│   ├── database.sqlite
│   ├── files (folder)
│   │   ├── file1.txt
│   │   ├── file2.tar.gz
...
uniqueNullptr2 commented 2 years ago

As I see it currently microbin is still being limited by ram. while the pastas are persisted to file, they are still all loaded into memory at the start and kept there.

I think either writing every pasta to an individual file or writing everything to sqlite would be a great improvement as that would allow loading individual pastas from storage, making it more viable on small vms. I do not know how much of an overhead it is to save BLOBs in sqlite, but I think if we split it like suggested by MasterGroosha we might have to implement some locking as it could be possible for the file and the rest of the pasta to be updated by different requests(data race). although I am not entirely sure what the concurrency implications are on writing to file without a lock either. Given that I think SQLite would be the superior option as the client handles concurrency and atomic updates for us, and sqlite is widely used. Additionally I think sqlite might be useful for implementing the feature requested in #10

I would love to take a crack at this. but I don't have much experience with database migrations in rust. As I see it our options are:

szabodanika commented 2 years ago

I support the idea of reducing RAM usage - since MicroBin is intended as a personal / tiny org tool I did not consider the footprint too much originally but I would love to extend it a flag like --sqlite [db file name] to keep everything within an SQLite DB and keeping the memory usage low(er). Since our data model is not very complex and we don't have any particularly complex queries, I don't think we need to use an ORM, a simple DB library with a few basic SQL queries should work absolutely fine IMHO. As for storing files as BLOBs within SQLite, I'd vote against it, mostly just because of the added complexity, but I'd be happy to explore that idea after the base data (that is currently stored in JSON) is successfully stored in SQLite.

uniqueNullptr2 commented 2 years ago

Ok I started some work on this but am running into some diffuculty.

I thought it would be a good idea to replace the AppState which is currently a Mutex<Vec> with a dyn DataStore, where the datastore can be implemented for the current InMemoryJson store and the new sqlite store. This would make any future changes to datastore a lot easier but it would be a big change on this feature.

The exact problem I have with this, is that the sqlite would return owned data, where right now it is using an immutable reference which is desirable. Otherwise we would have to clone the whole pastalist to reaturn it in the route which would put further strain on memory usage. I think we could get around this by returning a CoW from the method.

The obvious naive implementation would be to do if/else depending on the set sqlite flag everywhere we interact with data, but that would be kinda ugly.

Thoughts on this?

edit: its a catch 22 Sqlite api will always return owned data, but that disparity can be solved with a Cow however the api for the jsonstore has some problems.If we do the mutability internally we are also forced to return owned data as a reference cannot be passed outside because of the mutexGuard. However if we use a mutex on the outside we cannot use trait objects because Mutex requires T to be sized. catch 22. so I am not sure how to proceed

hay-kot commented 1 year ago

Hey! Is this still being worked on? I wouldn't mind taking a crack at getting Sqlite setup as a storage backend.

szabodanika commented 1 year ago

Hey! Is this still being worked on? I wouldn't mind taking a crack at getting Sqlite setup as a storage backend.

I think a lot of users would still like to see this implemented. Also I am planning to create an "official" public MicroBin server that will definitely need a high-performance backend. There is a nice PR for this but I would like to see something finished before merging anything. Whether you take the route of a data-agnostic persistency API or something simpler is up to you. Once I am caught up with the smaller issues and PRs completely I will have a more serious look at this myself as well

uniqueNullptr2 commented 1 year ago

I currently don't have the time to work on this so feel free to give it a go @hay-kot