What is the best way to handle schema migrations?

Currently the triggers are only attached during the start phase of marmot. Individual schema changes (new tables / columns) are not reflected at runtime and require either a manual way to apply them on all DBs in the cluster + restarting each marmot node to update triggers and changelog tables.

Are there some recommendations how to approach it in a straightforward and maintainable way?

Best, Roman

Yes, right now what you have described is the case and the only way to handle schema changes is via stopping Marmot, dropping a snapshot on all nodes, and restarting the Marmot process. The major reason to not implement schema changes propagation is because:

Messages can arrive out of order in a sharded configuration. So it's quite possible that some of the logs will start failing because they have new schema changes while the table DDL has not been applied yet.
I personally believe just like when you deploy your app with new schema in SQLite you have general tendency to deploy new code-base and restart the app. Schema changes are not that frequent, so like any DBMS when you run migrations you can get a down time, it should be expected here as well.

Having said that, here are my recommendations:

Have some scripting/tooling (I have a dirty version in Python version that I can probably post in a gist) that can:
- Stop Marmot
- Performs Marmot cleanup (-cleanup) so that trigger tables are dropped.
- Takes DB snapshot. You can either use Marmot (-save-snapshot) or use your own script to do so.
- Applies the migration from some .sql file (Read below on questions I have on how people get their migration scripts)
- If schema migration was successful, take a snapshot again!
- Starts Marmot
Ideally this script lives along with your code base, and can be part of your CD job.

I've been brain-storm about this topic, and I have questions that I believe community can answer. It will really help me develop a solution that works really well for community:

How do you maintain / run migrations today with SQLite? Specially when you have made changes that can be breaking changes.
If you have 5 nodes with same DB running what is preferred distribution mechanism? I am imagining some sort of rolling update CD job that brings up a node, runs migration on a snapshot, before plugging it into traffic. Is that true?
Would it matter if Marmot introduces a dedicated topic in NATS (Stream) for migrations. That means you will have to use Marmot some internal HTTP endpoint enqueue an upgrade script, which will effectively let Marmot know update is here, and with some topic name versioning tricks it should be able to get you to new schema, without restarting Marmot. It's a bigger change that will need some work, but I am curious if community will prefer that approach. I can actually expose HTTP endpoint in that case, my biggest fear in this case is that people will end up in situations where they have not updated their code but ran the script (but that's true for today's RDBMses too).

Again I am open to suggestions and contributions on this topic.

@maxpert thanks for the elaborate response!

The algorithms that you describe involves downtime. Also, if I understand correctly, the migration runs on a single marmot node , and the distribution to other nodes happens via snapshots, so that other nodes would be required to download the snapshot to be up-to-date? If that is the case, it would be problematic with bigger database (100s of GB), and the migration would involve a long time.

How it could work:

maybe it would be possible to have a special table, that marmot would use to signal the state of the migration. A rolling update is good, and having some way to see the progress of the roll-out through the cluster would be great. Topic in NATS for migration:

this would be great! We could mark each database as being in-progress for migration (possibly lock it for modification), stop accepting writes, drain the pending changes from the changelog tables and then run the migration script in parallel on all nodes. Once the complete cluster is finished, we mark tables as ready for writes and release the lock. The application could check status and display a friendly maintenance page during migration.

How are migration executed in general is a very broad question, since there are so many tools to manage migrations. Some framework specific (Rails / Phoenix / Django), some language specific (https://atlasgo.io/, https://github.com/amacneil/dbmate, https://flywaydb.org/). I guess relying on those instruments would make support very hard / impossible, so the best way would be an internal simple way to apply a SQL script on all DBs in the cluster in a controlled fashion.

I guess the simplest would be to post a SQL file with 2 attributes: name + timestamp as identifier for this particular migration. We could start with only forward migrations, because supporting reversible migrations does not really work in production. We would also need to store applied migrations in a table, maybe "__marmot_migrations"?

Maybe we should collect those ideas in a documents (could be a markdown file in the repo), with arguments why those design decisions where preferred. From that document we could distill the minimal feature set, that would just solve the issue at hand.

@maxpert thanks for the elaborate response!

The algorithms that you describe involves downtime. Also, if I understand correctly, the migration runs on a single marmot node , and the distribution to other nodes happens via snapshots, so that other nodes would be required to download the snapshot to be up-to-date? If that is the case, it would be problematic with bigger database (100s of GB), and the migration would involve a long time.

No we are not restoring snapshots, I am using snapshots as safety measure. You actually run the migration scripts everywhere.

How it could work:

maybe it would be possible to have a special table, that marmot would use to signal the state of the migration. A rolling update is good, and having some way to see the progress of the roll-out through the cluster would be great. Topic in NATS for migration:

this would be great! We could mark each database as being in-progress for migration (possibly lock it for modification), stop accepting writes, drain the pending changes from the changelog tables and then run the migration script in parallel on all nodes. Once the complete cluster is finished, we mark tables as ready for writes and release the lock. The application could check status and display a friendly maintenance page during migration.

How are migration executed in general is a very broad question, since there are so many tools to manage migrations. Some framework specific (Rails / Phoenix / Django), some language specific (https://atlasgo.io/, https://github.com/amacneil/dbmate, https://flywaydb.org/). I guess relying on those instruments would make support very hard / impossible, so the best way would be an internal simple way to apply a SQL script on all DBs in the cluster in a controlled fashion.

I guess the simplest would be to post a SQL file with 2 attributes: name + timestamp as identifier for this particular migration. We could start with only forward migrations, because supporting reversible migrations does not really work in production. We would also need to store applied migrations in a table, maybe "__marmot_migrations"?

Excellent you are essentially converging to same idea of having migration script essentially being published on NATS, and then applying them everywhere.

Maybe we should collect those ideas in a documents (could be a markdown file in the repo), with arguments why those design decisions where preferred. From that document we could distill the minimal feature set, that would just solve the issue at hand.

I can introduce a RFC label under issues, but in long run we might have to move to different repo like https://github.com/reactjs/rfcs

maxpert / marmot

What is the best way to handle schema migrations? #58