lightningnetwork / lnd

Lightning Network Daemon ⚡️
MIT License
7.67k stars 2.07k forks source link

add SQLite3 backend option #6176

Closed joostjager closed 1 year ago

joostjager commented 2 years ago

LND 0.14 ships with support for Postgres as a database backend. Unfortunately the full potential of Postgres can't be reached because internally LND still uses key/value storage everywhere.

This means that currently with a Postgres backend, the data that is stored is mostly unreadable. It uses custom serialization/deserialization in several flavours. Any tool that wants to operate on this data needs to implement the exact same serialization methods.

More importantly, maintaining custom serialization code is time-consuming and error-prone. This becomes apparent in an extreme way for database migrations. For each migration, a frozen copy of the serialization code needs to be added to the code base, to ensure that the migration won't ever break.

Furthermore, a minimalistic kv store like bbolt doesn't offer functionality like indexes (other than on the key) and constraints. Functionality that could be put to good use to guard database integrity in a simple way and make lnd more reliable and faster overall.

These are all advantages that could be unlocked by a step-by-step migration from kv to structured sql. The problem with that migration is that there is still bbolt. If bbolt support needs to stay, a large amount of code needs to be added to have two implementations for every store in the system. This means more potential for bugs and increased maintenance costs.

One way to address this problem is to replace bbolt by SQLite. SQLite is probably one of the top five most deployed software modules in the world, so reliability shouldn't be an issue.

The initial replacement will stick to the kv model for data. But once fully on the sql platform, it becomes possible to migrate select sub-systems that need it the most to structured sql tables. This migration can happen at the same time for both SQLite and Postgres, so that there won't be duplicate implementations.

Ideas for a light abstraction to iron out the differences in SQL dialect between Postgres and SQLite can be taken from c-lightning.

A first test with LND on SQLite has been carried out successfully.

Roasbeef commented 2 years ago

Ideas for a light abstraction to iron out the differences in SQL dialect between Postgres and SQLite can be taken from c-lightning

Any sort of ORM layer will mask lower level SQL dialect/capability divergences. What they've done there is effectively rollig their own solution. Tools like gorm have that covered. Also as seen from your sample commit we aren't yet using anything super specific w.r.t SQL capabilities (advanced postgres features or w/e).

These are all advantages that could be unlocked by a step-by-step migration from kv to structured sql.

As always, the devil is in the details here. Switching over night to "pure SQL" isn't realistic. First, there're matters of migrating every existing lnd node running in production to a newer, theoretically more reliable, but less battle tested instance. Next, there're portability concerns that come with including a C-based sqlite implementation (pure Go implementations thankfully exist, at the tradeoffs of some performance gains). Finally, doing such a switch over will be an immense task given how stateful a Lightning implementation is. We'll need to weigh resource allocation (everyone wants to make PRs, no one wants to review or test them as usual w/ open source) larger projects like this, alongside in-flight ecosystem upgrades like supporting on-chain taproot support, off-chain, channels, PSBT extensions, etc, etc (this is even ignoring other upcoming protocol enhancements).

Nevertheless, I agree bbolt has issues and we've effectively outgrown it at this point. I think an initial target would simply just be adding sqlite (using a pure-go driver) as a backend DB driver along side the existing postgres support to get our feet wet then go from there. I think sqlite support (in kv-land for now) is reasonable to target for the next major release. Going all in on SQL however will result in tens of thousands of lines of raw code diffs, will take longer than we expect, and would be one of the largest architectural changes in the (currently short) history of lnd.

There's also an alternative piece-wise path: with some of the refactoring we've done in lnd, it's possible to pass in a custom database implementation. If we extend this to also expose some of the existing interfaces like the routingGraph or watch tower database then users of lnd would be able to piece-wise insert new pure-SQL data stores w/o having to wait for the completion of some multi-quarter migration process. This piece-wise approach also lets us target the more heavily loaded segments of the database (payments, invoices, payment circuits, etc) that have the most to gain (performance, query-ability, etc). Moving things like the last message we need to transmit to a pair (simple KV stuff) won't see much benefit from SQL-ization.

Roasbeef commented 2 years ago

^ TL;DR: rather than attempt to rip everything out in one swoop, we should incrementally target the areas that have the potential for the most gain (heavily loaded areas like: payments, invoices, and forwarding related material)

joostjager commented 2 years ago

Any sort of ORM layer will mask lower level SQL dialect/capability divergences. What they've done there is effectively rollig their own solution. Tools like gorm have that covered. Also as seen from your sample commit we aren't yet using anything super specific w.r.t SQL capabilities (advanced postgres features or w/e).

I haven't looked into it much yet. Some ORMs can be a bit too smart sometimes also. But indeed, there are many ways to cover this. And perhaps it won't ever be necessary to use super specific capabilities.

As always, the devil is in the details here. Switching over night to "pure SQL" isn't realistic.

That's why I proposed to do a step-by-step migration from kv to pure sql. I never wanted to suggest that all the stores should be converted to structured sql in one go.

I totally agree that a full switch-over from bbolt to key-value-sqlite can't happen in a single release. Running hours are needed to gain confidence and just adding SQLite as a backend first makes sense.

There's also an alternative piece-wise path: with some of the refactoring we've done in lnd, it's possible to pass in a custom database implementation. If we extend this to also expose some of the existing interfaces like the routingGraph or watch tower database then users of lnd would be able to piece-wise insert new pure-SQL data stores w/o having to wait for the completion of some multi-quarter migration process.

I think the downside of this is that there will be two implementations of a store to maintain for an extended period of time. Dev and maintenance could be done by another team, but overall it may be less efficient. Also that custom implementation will need to have clear added value beyond maintainability of the code for it to be used. And there is the question of how problematic it is to have lnd state spread across a bbolt file and a sql database.

Roasbeef commented 2 years ago

I think the downside of this is that there will be two implementations of a store to maintain for an extended period of time.

That's good point, on the code level we can use things like interface assertions to guard new functionality. Like let's say someone has been working on an implementation using Redis instead of bolt to store invoices. We add a new base feature to the lnd invoice manager that exposes an efficient way to expire old invoices. An interface assertion an runtime lets lnd know if that feature is available or not.

The upside is that storage evolution is decoupled from mainline lnd. Viewed from another lens, force external maintenance may actually be a feature, as it makes the maintenance duties more explicit. Often open source projects can have a contributor add a large feature, only to disappear afterwards once maintenance and bug fixes become a matter. By having certain storage implementations be externally housed, the lnd project doesn't need to commit to maintaining a slick Cassandra implementation for payment storage.

And there is the question of how problematic it is to have lnd state spread across a bbolt file and a sql database.

Today, lnd already maintains several individual bolt DB instances (when not running in remote mode): the wallet, the watch tower, the onion secret state expiry (for replay protection), and the main channel database. Assuming we're consistent with our application/process level hand offs and restarts (checkpointing if needed, re-sending messages, etc) I don't foresee this being too much of an issue. Even with the current remote db operating mode, iirc we still end up having the state expiration log on disk.

With all that said, I think abstracting out the invoice storage is a good initial target, as the current iteraction is very tihglty coupled to channeldb, and is one of the largest DB segments for active nodes.

Roasbeef commented 2 years ago

Renamed the issue to just target a sqlite3 backend option. With enough time+testing, this can eventually be made the default. Before then the migration tool can be updated to support an automatic transfer.

Roasbeef commented 2 years ago

The relevant branch where some work has been started for those looking to pursue this issue: https://github.com/bottlepay/lnd/tree/sqlite

seejee commented 2 years ago

Hello! I'm interested in picking up this issue, and just want to make sure I understand the expected scope for the first iteration:

Given that postgres support has landed, the next step we'd like to take is to introduce sqlite support using the postgres kvdb backend changes as inspiration. In other words, when the next phase is complete, we'll have:

1) the existing boltdb backend 2) the existing postgres backend (using a single kv table) 3) a new sqlite kvdb backend (also using a single kv table for now) 3a) All the relevant docs, tests, etc. to make sqlite a 1st-class, supported backend.

All migration tooling and/or storing data in normalized SQL tables (as opposed to a single key-value table) would come in the future.

Is there anything that I'm missing or misunderstand? If not, I'll start working on this. Thanks in advance!

Roasbeef commented 2 years ago

In other words, when the next phase is complete, we'll have:

Correct! An initial target to start to store data in normalized SQL tables would be areas that are heavily loaded and could use better queryability like invoices https://github.com/lightningnetwork/lnd/issues/6288:

I've also started to use the c-go free version of sqlite along with sqlc in a new project. So far so good! I've also been using some new generic interface/middleware abstractions (project uses Go 1.18 by default), which'll allow us to potentially use sqlc for both postgres+sqlite w/o needing to implementing everything twice over (enables things like running several operations in a transaction w/o boltdb like tx closure syntax).

Roasbeef commented 2 years ago

Some relevant documentation re which options we may want to enable by default: https://www.sqlite.org/pragma.html.

Namely: foreign key checks should always be active, and we may want to enable the WAL mode as that'll enable us to support multiple readers in a non-blocking manner (readers don't block writers and writers don't block readers). The WAL mode would also enable backup/replication modes which aren't as safe as full on synchronous replication (such as litestream and rqlite -- tho the later is more heavy weight) , but may give hobbyist node operators that aren't ready to set up an RDMS server a sweet spot.

seejee commented 2 years ago

I've also started to use the c-go free version of sqlite along with sqlc in a new project

sqlc looks really interesting! I was planning to implement the sqlite support in a very naive way first with minimal abstraction to get everything working (and tests passing), and then taking a separate pass to add any abstractions between the postgres and sqlite implementations once we see the two side by side.

Does that sound like the right approach to you, or do you think I should give a sqlc approach a shot from the beginning?

Roasbeef commented 2 years ago

Does that sound like the right approach to you, or do you think I should give a sqlc approach a shot from the beginning?

I like that approach: it means we can get sqlite in with minimal-ish changes, then spend more time designing the abstractions needed to have both postgres and sqlite use sqlc in the same project. One thing that I've added in this new project is the ability to template in certain types/statements that aren't overlapping. As an example sqlite uses the BLOB type, but postgres uses BYTEA for table creation. Both ultimately will have sqlc emit a []byte in the model struct, and can be queryied freely with the same SQL afterwards, but need tables to be created slightly differently.

sqlc is working on "native" support for sqlite (which would let us use any sqlite specific things), but assuming we keep to mostly standard SQL, then we can use the main postgres code generation engine which uses database/sql instead of some custom postgres driver. In the new project I mentioned, I'm using the postgres parser and code gen engine but with an actual sqlite DB.

Roasbeef commented 2 years ago

Another relevant tool that might come in handy once we start to add proper SQL tables into the mix: https://github.com/golang-migrate/migrate. It can be used as a CLI command to run migrations, or as a Go package. Under the hood, it'll handle locking tables if needed during a migration, and creates a new internal table that tracks which database version is active.

joostjager commented 1 year ago

In the mean time, I gained some experience with golang-migrate/migate, and there is one annoying thing about it: If a migration doesn't complete, it sets a dirty flag that needs to be cleared manually. This even happens when the whole migration is executed in a db transaction and nothing is actually dirty. Many people complaining about it.

Furthermore this library requires migration sql to live in separate files which isn't always the best option when you want a self-contained executable. The files can be go:embedded again, but why not define a few clean sql strings.

Because of these reasons, I switched to github.com/rubenv/sql-migrate which does everything I need.

Roasbeef commented 1 year ago

Furthermore this library requires migration sql to live in separate files which isn't always the best option when you want a self-contained executable. The files can be go:embedded again, but why not define a few clean sql strings.

What do you mean by this? The files can be embedded and then it's as if everything else can be read from a virtual file system in the Go runtime. Clean strings here means you'd rather have the migration live in normal Go strings in some file? I don't see how that's any different than the embedded VFS approach, particularly given you can take those regular strings then wrap a fake file system around it to adhere to the interface.

Either way, we don't have anything to migrate quite yet (so don't need to commit to a migration library), since we're still in pure KV land for now.

Roasbeef commented 1 year ago

Re-opening as we still have #7252