sqitchers / sqitch

Sensible database change management
https://sqitch.org
MIT License
2.78k stars 214 forks source link

Add support for merging a registry with a modified plan #200

Open acrobat opened 10 years ago

acrobat commented 10 years ago

I have a problem with sqitch plans (and databases) between different versions. We currently have a 2.0 branch and a master branch (which represents a future 2.1 version) and there will be changes on both branches. Changes on the 2.0 branch will be merge into master (bugfix changes) and master will have feature changes, but when merging branches it isn't possible to deploy changes with sqitch. This is an example case:

2.0 branch is created, this is the state:

2.0 branch master branch
change A change A
change B change B
change C change C

Bugfixes are created on 2.0 and new features developed on master. A database per version is build from scratch

2.0 branch master branch
change A change A
change B change B
change C change C
bugfix A Feature A

We merge 2.0 in master (to transfer bugfixes to master). This is the result:

2.0 branch master branch
change A change A
change B change B
change C change C
bugfix A Feature A
bugfix A

But here is where the problem occurs, the merge like this is correct for the master database. Bugfix A must be include after feature A, because if bugfix A is before feature A sqitch deploy will exit with an error that it can not find bugfix A in the plan file (quite strange error, but it's an wrong error message i think)

So this will block us from upgrading databases from version 2.0 up to 2.1

How can we solve this problem or can we find a solution in sqitch so that the order of the plan file doesn't matter but it just follows the requirements. I hope my explanation is a bit clear otherwise i will be more than happy to explain it a bit more!

Thanks in advance

theory commented 10 years ago

Nice use of tables!

FWIW, though, the mail list is probably a better place for a question like this.

Anyway, I think the thing to do is to rebase master on top of the 2.0 branch, like so:

sqitch checkout master
sqitch revert --to change_c
git rebase v2.0
sqitch deploy

This ensures that the state of the database is as as the master branch would understand it. Then you revert the changes added to master since v2.0 branched. Then rebase on v2.0. That will revert the Feature A commits, then apply commits from the time v2.0 was branched to v2.0's HEAD, meaning bug fix A. Then it reapplies commits created since v2.0 was branched, meaning it adds Feature A back in. At that point, you should be able to deploy as usual.

HTH!

theory commented 10 years ago

Another approach that will do the same thing, but doesn't require you to remember the name of the last change common to both v2.0 and master:

sqitch checkout v2.0
git checkout master
git rebase v2.0
sqitch deploy
acrobat commented 10 years ago

Thanks for the tip about the mailinglist! Yes i was aware of the rebase command and that's an approach which does work in a development environment, but how to solve this with production databases? A client application has started with version 2.0 and now wants to upgrade to 2.1, how would you solve that?

theory commented 10 years ago

I would follow the process outlined above. At that point, the master branch is ready for a 2.1 release. Create a new 2.1 branch and do the release from there:

git checkout -b v2.1
sqitch tag v2.1 -n 'Tag v2.1.'
sqitch bundle --dest-dir ~/widgets-v2.1.0

Now master is the path to v2.2. If you make a change to v2.0, you'll need to rebase v2.1 from v2.0, then rebase master from v2.1.

acrobat commented 10 years ago

But how would this work for production databases? Because a rebase could potentially mean data loss for that db.

The main problem, i think, is that by merging bugfix sqitch scripts from 2.0 into master (2.1) the sqitch.plan file get's out of order compared to databases which were build from 2.0 and now need 2.1. If sqitch could just follow the requirements it would not be a problem how the lines were in the plan file, the user just need to make sure that their requirements are correct

theory commented 10 years ago

This pattern assures that changes are always in order in the master branch. I am assuming that you don't deploy to production every time you change master. What you do is deploy only when you do a release.

A more useful pattern might be to keep a develop branch for forward-looking work, and keep master in final release state. In this case, if master was currently the v2.0 release, and you made changes to the develop branch towards v3.0, and to the v2 branch for maintenance, when you were ready to release v2.1, you would merge it into master (because it should only have changes made since master was last released), then rebase develop from master. Continue to do maint work in the v2 branch and new work toward v3.0 in develop.

Later, when you're ready for v3.0, create a v3 branch from develop, then merge it into master and release. Rebase develop from master again. Now continue doing maint work in the v3 branch and forward-looking v4.0 work in develop.

Make sense?

acrobat commented 10 years ago

Yes this makes sense except that we already use such gitflow only that we have a master (next minor version) and version branch which are always stable and ready for release.

So the git part is clear to me, but the problem is more with the sqitch plan between version, i don't exactly get how manage that. Maybe we are doing something wrong. But merging database bugfixes from 2.0 to master will append those changes (that way test builds for our master branch can be deployed) but when eventually master is released and users from the 2.0 branch will upgrade to 2.1, the database deploy will fail with this kind of error

$ sqitch deploy
Cannot find 46446ea738171102cd9284d471a7607ee6e5bf1a in the plan

//In the example above the hash would be representing change "Feature A"

Because changes from master will be first in the master sqitch plan and bugfixes from 2.0 on the end of the file

I hope you understand the problem a bit better know, i guess it's not related to the git flow but more how the plan file is build. It's quite possible we are using sqitch in a wrong way, but i can't find a "right" way, so it seems :(

tprocter commented 10 years ago

I'm interested in suggestions for this as well. We have this problem fairly often and it usually requires manual correction by editing the plan file (swapping changes around) followed by babysitting the actual deployment, taking changes one at a time and correcting the plan and/or the sqitch database as we go.

I'd like to see a more stable way to manage the plan when more than one version of the application needs to be actively maintained. In this scenario, we can't assume that changes are always being appended to the end of the plan. They might need to be inserted in a previous tag, for example adding a hotfix tag between releases. Again - easy to do on the release branch, but tricky on a master or develpment branch where future changes already exist.

theory commented 10 years ago

I see, the problem is that you can have two versions in production at once. So if you add a chance to your v2 branch and release, and then add the same change to the master branch and release, the change will be in different places, which screws with people when they later want to upgrade from a master release to a v2 release.

I'll have to think about this. The current implementation assumes that changes will always be sequential, but maybe we'll have to add some kind of support for multiple parents, like Git commits support. I'm not at all sure how that should work, though. Someone smarter than me might have to give it some think.

acrobat commented 10 years ago

yes you are right @theory that's the case what is was trying to explain. I will also try to find some more info on how git commits work and how this can be supported in sqitch. That way we hopefully find a decent solution for this problem!

theory commented 10 years ago

It would most likely end up requiring a new plan format to support multiple parents.

acrobat commented 10 years ago

I found this stackoverflow link on the sqitch usergroup, http://stackoverflow.com/questions/13369187/can-order-be-better-preserved-when-using-the-tsort-algorithm

Maybe it's a solution to not have a sequential order in the plan file, but just follow the requirements as they are defined by the user?

theory commented 10 years ago

The reason it uses a git history/BitCoin blockchain-like approach is to guarantee that things are always deployed in the correct order. This has saved us from fucking up production deployments a few times. That is something I don't want to give up.

But if there were a few branches in the chain, that would be okay. I suspect what we'd have to do is add something to the plan to list parents for each change, then update things to look for multiple parents. I don't have a lot of mental bandwidth to give this at the moment, but I suspect it will require quite a bit of re-organization of the plan parser and in-memory representation. It's do-able, though, I have no doubt.

acrobat commented 10 years ago

Yes i see, that way it's indeed not a good idea to drop the "sequential" plan type. Would be nice and hopefully we are able to integrate this change in to the 1.0.0 release!

theory commented 9 years ago

I've been thinking about this off and on for the last few weeks. I think that, if we want to get something done in time for v1.0.0. it wouldn't be a plan format with multiple children from a single change. I have no frigging idea how to go about that at this point.

So I think an interim solution would be to add some kind of --force option to deploy and related commands. It could work in one of a couple of ways:

So, here's what I propose to do:

Thoughts?

theory commented 9 years ago

I've now implemented the upgrade command and closed #87.

tprocter commented 9 years ago

So here are my thoughts.

Ovid commented 9 years ago

In production, when dollars and business continuity are on the line, it's not enough to encourage best-practice workflows. With large-scale projects and multiple developers working on multiple versions of the same code simultaneously. It's inevitable that there will be exceptions to the expected behaviour.

Completely agreed. My current client is hiring plenty of devs and not all of them take the time to read the documentation or, if they do, they don't always understand it. While it's good to have well-designed software that offers and affordance to do the right thing, in reality, people will always find ways of doing unexpected things.

theory commented 9 years ago

To survive this, we've developed a helper tool for sqitch that identifies discrepancies between a sqitch filesystem and a target database. It recognizes changes that have been inserted, deleted or modified on the filesystem with respect to what the database expects, and offers steps to manually correct this (mostly in the form of SQL).

I would be very curious to see what this code looks like. This is the purpose behind my plan to add deploy script hashes to the database, essentially as a second identifier that derives from the script itself, rather than the plan.

theory commented 9 years ago

insert - add a new or reworked change between existing tags. I realize some teams may not want to use this in their respective workflows, so maybe have it disabled by default, but for the rest of us it's critical. The only alternative is to edit the sqitch plan by hand and I'll leave it to your imagination what kind of problems that can cause.

Maybe add options to the add command, like --before foo, or even --before foo^ or --after @bar. I'm not sure how this would affect deployments, though. If you have already deployed foo, then add a change before it, how should the deployment work?

rename - in order to keep the filesystem tidy, and to deal with cases when objects in the database get renamed, there should be a way to do this. Sqitch should be able to track an old and a new name for a given change.

Hrm. That's quite a lot of information for Sqitch to have to track, and the plan format is kept deliberately simple. Right now Sqitch does't know any names at all, it's all in the plan.

What about something like sqitch rebase --force --log-only? That would update the registry of a deployed database to reflect exactly whats in the plan, without running any of the deployment scripts.

theory commented 9 years ago

Much of the errors could also be avoided if we continue with the git metaphor and add "staging" support.

  • Changes could be collected in a staging area (like a mini-plan), but only moved into the plan at a chosen time.
  • The final ordering of changes isn't determined until they are moved into the plan (users can select which changes and when to move them from staging to final). This duty might be given to someone with a release manager role.

This is already something that can very much be done with SCM branches, and a single party responsible for merging.

  • This would allow developers to work on feature branches that might be worked on across sqitch tags, which is currently very painful and error-prone to merge. It would also reduce the frequency that we get a git conflict on the plan file itself which currently happens on almost every merge.

Have you tried using union merges? I discussed some patterns with them to simplify merges in the tutorial.

  • We might also consider multiple named staging areas if users want to work in fully isolated spaces, but that might be more trouble than it's worth.

Well, it might be interesting to allow a single project to have multiple plan files…

theory commented 9 years ago

Completely agreed. My current client is hiring plenty of devs and not all of them take the time to read the documentation or, if they do, they don't always understand it. While it's good to have well-designed software that offers and affordance to do the right thing, in reality, people will always find ways of doing unexpected things.

Yeah, the point of this issue has become, I think, "How do we add tools to correct for unexpected conflicts?" Examples include inserting change before tags, changing the plan order, and having multiple releases (and therefore tags) with the same changes. I think we can draw some inspiration from git push --force, git filter-branch, and the like.

theory commented 9 years ago

Okay, with these commits, I've added support for logging the deploy script hash along with each change. With that in place, I propose a first pass at reconciling differences between a target registry and a plan as follows:

Add a new option to deploy: --merge. Without this flag, deploys happen just as they do now. But with it, we try to merge a plan into the registry by taking the following steps:

This approach should solve the original challenge, where multiple releases of the plan can have different tags and use the same changes. Assuming those changes have identical deploy files or have the same name and are un-reworked, we should be able to merge things such that everything is in the proper order.

The one catch I can think of is if the last change in the plan is not the last change in the registry (ordered by commit date). In that case, we would need to update the commit date for that one change.

Thoughts?

acrobat commented 9 years ago

By the looks of how the merge option works it should fix this issue or atleast in the case where we are at. The plan file is just a bit out of order but the content of each file referenced in the plan file is exact over all branches!

So I think we are good to go with this fix!

theory commented 9 years ago

Excellent, thanks for the feedback, @acrobat!

theory commented 9 years ago

In expectation of the proposed work here, In da8f24e5 I went ahead and added support for "merge" events to be logged. None are logged yet, but I wanted to get all the schema changes done before a release, which I expect to do soon.

acrobat commented 9 years ago

Looks good @theory!

decibel commented 8 years ago

I'm not sure if this is still an open item, but here's a few thoughts about it:

All of those sound pretty non-trivial. Maybe they're worth doing; maybe not... but if they're not done then it would be good to have specific documentation on how to handle some of these scenarios. (Maybe some of that already exists; I'm not deep enough into the Koolaid yet to have encountered most of these problems...)

theory commented 8 years ago

FWIW, you can create object-specific templates yourself, as detailed in this blog post.