liquibase / liquibase

Main Liquibase Source
https://www.liquibase.org
Apache License 2.0
4.64k stars 1.84k forks source link

Add first-class support for NoSQL databases #4236

Open rursprung opened 1 year ago

rursprung commented 1 year ago

Search first

Description

currently there are already plugins which add support for NoSQL databases, e.g. liquibase-mongodb and liquibase-cosmodb (both started by @alexandru-slobodcicov from what i can see). they both have a somewhat-generic liquibase.nosql wrapper which is however not upstreamed to liquibase-core and also relies on AbstractJdbcDatabase and some other hacks (e.g. a java.sql.Driver implementation) to integrate with liquibase (which sort-of expects it to be a JDBC / java.sql connection).

Steps To Reproduce

try to implement a new liquibase extension to support a NoSQL database (e.g. OpenSearch) without having to use any JDBC / java.sql related code.

Expected/Desired Behavior

it'd be great if liquibase-core would have support for NoSQL databases as a first class citizen, so that (a) such hacks (faking JDBC stuff) are not needed and generic code (like the liquibase.nosql package linked above) is either present directly in liquibase-core or published as a separate library which can then be re-used by such implementations.

Liquibase Version

No response

Database Vendor & Version

No response

Liquibase Integration

No response

Liquibase Extensions

No response

OS and/or Infrastructure Type/Provider

No response

Additional Context

there are discussions around adding a liquibase extensions for OpenSearch and i'm currently exploring the options there: https://github.com/opensearch-project/opensearch-migrations/issues/148 (btw: feel free to drop in on that ticket and provide your feedback there as well! 😉)

the wiki entry for new databases only mentions NoSQL in passing but does not go into details on how to do this properly.

Are you willing to submit a PR?

i didn't select the "i'm willing to submit a PR" checkbox because i presume at the moment that this will require good knowledge of the internal workings of liquibase (+ some architectural decisions by the liquibase maintainers) to implement - and i definitely do not have that knowledge at this point (never did anything in this codebase). if it turns out that this is simpler to implement then i (or someone else) will probably be able to contribute this (presuming that providing a liquibase-opensearch extension is the way to go).

nvoxland commented 1 year ago

Thanks for opening the issue, @rursprung. This is something that's been on the background list for a very long time, but it's never made it to the forefront. Definitely well past that....

A big part of what's kept us from tackling it is not having enough direct experience with what all the various nosql databases to know answers to questions like "what should work consistently across all the databases and what is unique?". So I'll take over this issue as the place to point people to and we can all work through it together and then start working on the implementation.

Lets start with overall wants/requirements without getting too in the implementation details yet. We don't even have to worry about how to get there. Just figure out the ideal end state. I'll start with what I know of. Let me know what's missing. After we get good consensus, I'll move decisions up to the issue description and we can start thinking about implementation plans.

1. Base classes / interfaces cannot reference JDBC classes

JDBC provides a generic abstraction around a database, but it makes relational-assumptions that don't work for NoSQL databases. A current work-around for the too-jdbc-bound Liquibase code is to make a stub JDBC wrapper around a NoSQL database's driver to adapt the needed parts of JDBC for Liquibase, but JDBC is a large spec and there is no way to know what parts of that interface need to be implemented without trial and error.

Liquibase has the start of an alternative database-agnostic interface which is basically trying to be a lighter weight cross-database interface with liquibase.database.DatabaseConnection. The idea is/was that we'd have a JdbcConnection subclass and can do other subclasses for each other unique driver to be the adapter between the actual driver and Liquibase. But, the DatabaseConnection interface is not complete in what liquibase needs and therefore much of the code expects/requires the JdbcConnection class.

There is also the liquibase.database.Database interface which was originally the overall facade to the database before it ended up too large and unwieldy. Things like DatabaseConnection have been spun off from that, with the old methods on Database still preserved for backwards compatibility. But, the Database interface still has a lot of relational-expecting methods on it. I think that interface needs to be trimmed down to what is truly cross-database with an RelationalDatabase sub-interface and AbstractRelationalDatabase implementation for what is shared across relational databases. At the same time, we should better define what that interface is for. In my mind, at this point it's job is to be the "dialect definition" and shouldn't be handling interactions with the database itself. So maybe even renaming/re-creating it as DatabaseDialect if we're busy tearing up that class anyway?

Beyond the base DatabaseConnection object, database interactions in general get defined with interfaces which have standard implementations around them. For example, the ChangeLogHistoryService interface has a StandardChangeLogHistoryService implementation and the SnapshotGenerator interface has a JdbcSnapshotGenerator implementation with subclasses of that like ColumnSnapshotGenerator. Those need to be reorganized so the base implementation doesn't make assumptions about jdbc and is easily extended by db-specific implementations. Classes like ColumnSnapshotGenerator shouldn't extend a jdbc-specific class, but instead a JdbcColumnSnapshotGenerator should extend a re-usable generic ColumnSnapshotGenerator, etc.

This will be a lot of moving around of code and balancing backwards compatibility with changing APIs will be it's own fun challenge. But we can get into that later... This also gets to be a lot of code-level details, so I'm expecting this can remain a "remember, changing the API will be a big and important part of this project" for now.

2. Don't rely on the term "sql"

There are things like the <sql> and <sqlFile> change type which really means "user specifies statement" and doesn't really have to be sql. Neo4j added <cypher> as an alias, but is not as easy as it should be. And, what is the behavior we are wanting to support? Is it better to have database-specific types like <sql> for databases that understand sql, and <cypher> and <cql> etc. for others and if you use the wrong type you get a "cql not supported on mongodb" error? Or is there a generic term to use like <execute> or <run> or <statement> or something instead?

There are all the *-sql commands like update-sql etc. My proposal would be to remove all of them in favor of a --dry-run flag on the normal command, like update --dry-run. That seems to be the expected term from people for that functionality in general, and avoids the above "what generic term" problem we'd hit trying to do update-statement or whatever.

3. Improvements to command overriding support

Liquibase has it's command API which defines commands like "update" and "snapshot" as a series of CommandSteps. Generally new CommandSteps are additive in that they add themselves to the command pipeline, but when NoSql databases want to completely replace the snapshot command with something unique to them, that gets difficult.

In general, I hope there is not any major work to do here:

But, it will likely take some actual usage to see what does and doesn't work. So like the general API block above, I don't think we need to get into implementation details of this right now.

4. Test Harness Improvements

https://github.com/liquibase/liquibase-test-harness/ is what we use to ensure databases are meeting "foundational" or "advanced" levels. https://www.liquibase.com/supported-databases/verification-levels

While those definitions work with nosql databases, the implementation of the test-harness makes some assumptions around testing a relational database. We do use it for mongodb and other nosql databases already, but it takes some work-arounds, copy/paste and more effort than it takes to test a new relational database.

What else??

What am I missing for blocks of things that make it difficult to add support for a NoSql database?

fbiville commented 1 year ago

I can speak about my experience with JDBC for the Neo4j extension.

I was lucky enough when I started to have https://github.com/neo4j-contrib/neo4j-jdbc available. However, it suffered from some issues that could not be fixed in a backwards-compatible manner, so I decided to re-implement the subset of the JDBC APIs Liquibase requires.

It turns out the biggest effort (in terms of implementation time) is mostly around result sets and, to a lesser extent, statements. The fact that Liquibase only ever uses a single connection per execution greatly simplifies the design of this lite JDBC connectivity layer: no need to pool any resources e.g..

Since Liquibase does not rely on any metadata (or very little), the impedance mismatch between JDBC and the target NoSQL store is quite limited: JDBC statements simply handle strings (it could be SQL, Cypher or anything else), results sets just care about rows (which just are glorified maps).

rursprung commented 1 year ago

thanks a lot @nvoxland for your extensive feedback, it's good to know that this is also something which you'd be interested in!

as i'm unfamiliar with the liquibase code base i'm unable to provide detailed feedback on your proposal, however it sounds good! i'm a bit uncertain as to whether you'll want to (or are able to?) do all of these refactorings in a non-breaking change or whether you intend to do this as part of a new major release of liquibase?

what do you expect would be the timeline for such a refactoring? would it make sense to do a first version of a liquibase-opensearch still based on the hacks done e.g. for MongoDB to already have something because the refactoring might take longer? (and in the best case at a later stage the hacks can be removed with small rework to move to the new APIs rather than throwing everything away and starting from scratch)

one problem (this is probably just a detail in the overall picture) i see with the current setup is that DatabaseConnection#supports has to decide solely based on a URL (which is a String). while at first glance this isn't JDBC/java.sql related it does make the assumption that the URL contains something unique (e.g. mysql:). however, this isn't the case for APIs which just use HTTP rather than a custom JDBC URL string. if you want to connect to OpenSearch the URL might be https://localhost:9200/, so you can't identify it (the port doesn't have to be this port and e.g. Elasticsearch uses the same port). so maybe a more explicit switch is needed somewhere? or the URL is defined in a non-standard way, e.g. opensearch:https://localhost:9200/ and the leading identifier is then stripped to get the real URL - but that feels more like a hack (which i'd use for now with the current possibilities). and this also doesn't answer the question whether it truly supports it - it could be that the database is running a version which isn't compatible with the client library (but this is presumably a solved question as relational DBs can have the same issue?).

@dblock @wbeckler @reta: could you (or your colleagues), with your extensive OpenSearch experience, maybe help to provide additional feedback from the POV of NoSQL DB developers? thanks!

nvoxland commented 1 year ago

@rursprung My guess is the timeline will be long enough that you're best of implementing liquibase-opensearch based on what we have now, even if it means using some of the hacks the MongoDB and/or neo4j and/or others have used. While it's not ideal that liquibase doesn't have more first-class support for nosql yet, it shouldn't be a blocker for being able to implement it. Even with the current class structure, it should be much easier to implement your needed functionality as a plugin to liquibase than as a from-scratch project.

Plus, you'll want to work through the extension in separately releasable stages which are generally:

  1. Able to do an update/rollback etc. using fully user-specified "sql" (or whatever corresponds to your database)
  2. (Optionally) implement applicable "standard" change types like "createTable" or "modifyDatatype" and mark unsupported changeTypes as such
  3. (Optionally) add custom change types
  4. (Optionally) add "snapshot" functionality which enables the snapshot/diff/diffChangelog/generateChangeLog commands
  5. (Optionally) add custom commands and more

where step 1 is fairly straightforward (even with the hacks) and gets users most of the functionality they're looking for. The remaining steps can happen in later iterations and what/when you add things could be partly driven by when the base APIs get more nosql friendly.

Like you, i'm a bit uncertain as to whether you'll want to (or are able to?) do all of these refactorings in a non-breaking change and how to handle and/or stage these changes out will take some good planning and will likely make for a longer-than-wanted timeline.

If you haven't seen, I recently wrote https://contribute.liquibase.com/extensions-integrations/extension-guides/add-a-database/ which is purposefully focused on relational databases since that is easiest to explain, but should hopefully be helpful to you still.

nvoxland commented 1 year ago

one problem (this is probably just a detail in the overall picture) i see with the current setup is that DatabaseConnection#supports has to decide solely based on a URL (which is a String). while at first glance this isn't JDBC/java.sql related it does make the assumption that the URL contains something unique (e.g. mysql:).

@rursprung Even long term, the "url" argument is what I'd guess we'll want to keep relying on as how to describe the connection. It seems general and not jdbc-specific, even though jdbc happens to use that pattern. But like you say, there is the "real" url of https://localhost:9200/ which doesn't really let us know what we're connecting to...

There is a isCorrectDatabaseImplementation(DatabaseConnection) method which can look a the actual connection and make queries to determine if the database implementation is correct. We use that for determing the version if that is important or to distinguish between the different postgresql-compatible versions that all use a "postgresql:" url.

So one option could be to have an HttpDatabaseConnection class which works with any https:// urls regardless of the database, and then OpenSearchDatabase could make whatever call it needs to determine if it's an opensearch connection.

But, that's probably (is my guess off hand) going to be a bit more hacky with the current codebase, and even long-term it has the problem that if/when there are multiple databases that support an http:// url it takes network requests from each to determine which actually apply which will be a lot of overhead.

So, I'd suggest having a url like opensearch://localhost:9200 (don't need to bother with the http part too) since that will be both easiest to do now and has a longer-term performance advantage. And if there is enough user confusion we can add http url support later

Finally, if you'd like us to create the repo as liquibase/liquibase-opensearch I can do that for you. That way we can help with a lot of the infrastructure stuff like the build logic in a standard way so you can stay focused on the database-implementation business logic, and also makes it a bit more discoverable/official for other contributors looking to be involved. If you'd like to talk more on this, you can email me at nathan@liquibase.org

reta commented 1 year ago

@rursprung thanks a lot for starting the discussion and @nvoxland , huge thank you for the insights (and all the Liquibase related work you are doing).

From personal experience, the presence of the tool that allows seamless schema / metadata migration (for OpenSearch, as an example) would be of tremendous help. I have to admit that over the years implemented / maintained at least three different (proprietary) schema migration tools for Elasticsearch (and later for OpenSearch), and every time I wished there is a Liquibase extension (or alike) available.

I think, for OpenSearch (and Elasticsearch) I have pretty much holistic picture what has to be done, at least with respect to schema / metadata migration, but I have difficulties fitting it cleanly into the Liquibase (points well summarized here) at the m,ment. In any case, @rursprung I am happy to help here, would be great to understand how do you see the ideal end result (at least for OpenSearch), leaving the implementation details behind (for now).

dblock commented 1 year ago

My experience with schema migrations predates working on OpenSearch - I've worked on a generic schema migration tool internally at MSFT as far back as 2001. It would compare two shemas at runtime, and execute a migration during a product upgrade, with downtime. I've also had to do this with MongoDB a thousand times, making changes to "schema" where a data migration was required, crossing my fingers that the migration completes before any real users hit the app, or being very careful. The workflow for migrations is always very similar, but the implementation differed significantly, therefore something like Liquibase to drive the process of migrating schema makes sense. On the other hand, SQL databases are something quite specific, and they look very much alike in concepts like ... schema. So, without looking at the code, what functionality will be reusable and valuable that liquibase has today for something like OpenSearch, once all the above refactoring was been done?

I think @bjpres will have a lot more clever things to say about this topic than me as "schema" migration is definitely one of the topics of https://github.com/opensearch-project/opensearch-migrations.

rursprung commented 10 months ago

If you'd like to talk more on this, you can email me at nathan@liquibase.org

@nvoxland: as written in the other issue over at opensearch: i had tried to get in touch with you several times via email (and by pinging you on the other issue). i'm now trying to ping you here as well in the hopes that if maybe you don't see the notification at least somebody else from liquibase will and might be able to get a hold of you internally 🙂 it'd be great to hear from you to get this started!