Open thewilli opened 7 years ago
Hi @thewilli,
SQLite as data storage backend
As-is, switching to a relational database engine requires major changes to Shaarli's core implementation:
This would most likely imply a major rewrite of the codebase, e.g. by introducing an object-oriented data model (#445), along with an object-request mapper (ORM) to support several relational database management systems.
There's also #324 that's been opened for a while to track legacy code refactoring efforts and needs some attention first :)
architectural overhead of an external database server a simplified internal API
Given how tedious the current system is to manage, I rather think a DB-oriented refactoring would greatly reduce development costs, an would provide more elegant upgrade mechanisms for the data model (e.g. with incremental, versioned database migrations).
common storage for all kind of data (configuration, links, ..)
From my experience:
faster access times (for sure, especially for many concurrent connections and a large data-base)
This is out-of-scope for the large majority of Shaarli users, as the current system seems to work well for instances holding several thousands of bookmarks ;-)
However, this change would be highly profitable for servers hosting multiple and/or on-demand Shaarli instances, as all users could be handled in a multi-tenant database.
the possibility to store large data (e.g. file uploads)
Unless there are very good reasons to do so, it's best no to burden a database with binary blobs.
Also, there are plenty of good file hosting/management services already available (ownCloud/nextCloud, CozyCloud, Nexus, etc.) that can be used to store files that could then be linked to a Shaarli entry.
Most refactoring requirements for an SQL storage have been merged, mostly switching to fully object oriented Bookmark, and a service layer as a single entry point.
how to migrate data when upgrading
I'm not in favor of dropping our current storage system entirely. It allows Shaarli to run on almost any PHP environment, without requesting DB specific extensions. I think that both storage system (files and RDBMS) can coexist.
We will still have to find a way to provide a tool to migrate data though.
along with an object-request mapper (ORM) to support several relational database management systems
I'm not really sure about that. An ORM is usually large and complex software which embeds a lot of features and has a high level of abstraction. Our use case for Shaarli is pretty simple and the codebase manipulates only a single object (Bookmark
).
While using ORM would definitely work, I'm wondering if using a query builder - a tool which builds PDO compatible SQL queries - supporting multiple RBDMS along with a simple custom DTO class (data to object) wouldn't be a more suitable solution.
I'm not in favor of dropping our current storage system entirely. It allows Shaarli to run on almost any PHP environment, without requesting DB specific extensions. I think that both storage system (files and RDBMS) can coexist.
They can, and pretty well at that. The (now defunct) CMS called PivotX did so for quite a few years.
We will still have to find a way to provide a tool to migrate data though.
Elsewhere in this ticket queue is PoC code for reading the existing datastore. It may not be trivial but it should be doable, especially if it's made a command line tool in addition to a functional module for the user control panel.
I'm not really sure about that. An ORM is usually large and complex software which embeds a lot of features and has a high level of abstraction. Our use case for Shaarli is pretty simple and the codebase manipulates only a single object (
Bookmark
).
True.
Although I'm absolutely pro-RDBMS/ORM as my Shaarli is getting bigger (~500Kb datastore), I ran into a couple of issues with SQLite lately:
I'm consolidating my Docker workloads in Kubernetes, and the main storage under the nodes is an NFS share that makes SQLite applications a pain to manage. I have dedicated PG and MySQL clusters outside of k8s for RDBMS apps that support it, and these are also on NVMe storage that makes them really fast.
When you introduce RDBMS support in Shaarli, please make SQLite an option but not a requirement. For example, Bitwarden_rs also defaults to SQLite but also supports PG and MySQL as well. However, I'd be happy even with one "real" (not file db) implementation in Shaarli.
DB files get corrupted over NFS
Because of improper NFS file locking? Looks like a known problem with workarounds [1], [2], [3], [4]. But it needs consideration, data integrity is a must-have.
~500Kb datastore
Sounds reasonable. Mine is 782Kb, 4118 shares. No performance problems so far.
application performance becomes unusably slow.
As will any system where connections between the app and DB is unstable/unreliable?
I'm consolidating on Kubernetes the main storage under the nodes is an NFS share that makes SQLite applications a pain
You brought this on yourself :trollface: We could suggest using SQLite only when the storage is a locally attached disk, and TCP/IP RDBMS access when over the network? In that case we might as well suggest to keep using datastore.php
on local installs, so why bother with SQlite? (?)
SQLite an option but not a requirement
SQLite/Mariadb/Postgres would be great, IHMO DB-agnostic interface is a major appeal for RDBMS support (along with cleaner migration mechanisms, less bespoke code). But this would certainly mean using an ORM.
And code related to reading the datastore must be kept for a while to allow datastore->DB migrations (much less maintenance though). Personally I like array-based storage in a flat text file, but I see the appeal for RDMBS-backed storage.
Just weighting the pros and cons.
Yes, I have an NFS v3 share, and as I learned it in practice, it doesn't work well with SQLite, so yes again, I brought this on myself, lol 😃 Since I've burnt myself with that once, I have a separate Docker VM just for things with SQLite and other file-based solutions like Shaarli. But as things are under consideration here, I've just came to shred light on this behavior of SQLite vs NFS, as it would be great to have everything in one cluster. I suppose most homelabs won't have advanced network shares that can guarantee atomicity, people just tend to go for what's easy to install and maintain. Shaarli should definitely warn about this with emphasis if SQLite will only be supported as RDBMS. However, if such an ORM is chosen that can work with multiple DB engines, at least one other should be supported by default (MariaDB or PG being the most popular). I'd also love to have a migration path, maybe through the Python CLI. A 500Kb file store is smooth at daily operations, only the first/main page load doesn't seem to be as snappy as it used to be. But I haven't measured it, just a feeling that is was faster before.
Even if this issue is related to SQLite, we can also support MySQL/MariaDB and PostgreSQL. Once we have an SQL version working, it won't change much to support those besides configuration.
We should keep performances issue/optimization in a dedicated issue.
In that case we might as well suggest to keep using datastore.php on local installs, so why bother with SQlite? (?)
Answering myself, because atomicity/integrity guarantees brought by a RDBMS are presumably more robust than a custom system (https://github.com/shaarli/Shaarli/issues/1292#issuecomment-493540897). Though https://github.com/shaarli/Shaarli/pull/1570 should fix problems with concurrent operations.
Currently, as far as I understand, there are two different storages (storage files), one for configuration and another one for the actual links.
What do you think about using SQLite as data storage backend behind a DAL implementation like the current ones (
LinkDb
,ConfigManager
)? I understand that you don't want the architectural overhead of an external database server (especially to support PHP hosting services), but as SQLite relies only on a file residing in the local file system and PHP has native support for it, this shouldn't be a criterium for exclusion, should it?IMHO this would have several benefits, including a common storage for all kind of data (configuration, links, ..) which could lead to a simplified internal API, faster access times (for sure, especially for many concurrent connections and a large data-base), transactions and the possibility to store large data (e.g. file uploads).