opendevstack / ods-jenkins-shared-library

Shared Jenkins library which all ODS projects & components use - provisioning, SonarQube code scanning, Nexus publishing, OpenShift template based deployments and repository orchestration
Apache License 2.0
72 stars 57 forks source link

Rework approach to versioning and branching (?) #480

Open michaelsauter opened 3 years ago

michaelsauter commented 3 years ago

I've been thinking about versioning and branching in the orchestration pipeline lately. Maybe it is possible to make things simpler by default, and make the complexity which sometimes is needed opt-in.

To recap the current situation:

This design was taken to allow teams to make small fixes to current releases without having to deal with newer changes that might have happened on master in the meantime. Unfortunately, this approach forces a certain amount of complexity also on teams that work "sequentially". Maybe we can improve this, and a few other things in the area. Here's what I've been thinking about:

I believe this to be less complex for the user to understand (as the complexity doesn't exist by default, but is an opt-in when the need arises). Also, it should simplify the implementation a bit because we do not need to create/push branches. We only search for the best matching Git ref.

There are two related thoughts I have in mind, that I also want to describe here:

  1. Avoid having to build a version in DEV. Right now, one cannot deploy to QA without building the release package in DEV first. We could speed up the process by specifying that a pipeline run in DEV is always a WIP build. For QA/PROD, we need a version. The QA build will then build the required docs (currently done by DEV). I'm a bit unsure if this is possible from a docs perspective, but I believe it would again make things simpler and definitely faster.

  2. Avoid committing to the component repos. Currently we create commits that might contain a deployment descriptor and might contain the OpenShift export. Doing this requires us to push the branches during the pipeline. Further, it creates the complication that we have a "built" commit (for which the pipeline is started) and a "created" commit (which the pipeline created). Finally, I dislike that we have humans and computers pushing to the same repo. We could throw away a lot of complexity if we wouldn't commit to the repo at all. To achieve that, we would need to store the "release package" somewhere else. Either a separate repo which just contains releases or Nexus come to my mind. If you combine this thought with the previous idea of having versions only for QA/PROD, then the orchestration pipeline would simply create one release package for every version that goes to QA, and store that somewhere. This release package could contain all files necessary to deploy/promote, so that one does not actually need to checkout the individual repos anymore.

@metmajer @martsec @clemensutschig @oalyman @felipecruz91 Feedback welcome. For now I'm just writing down my thoughts ... these still need a lot of shaping but I want to have this thought process in the open.

metmajer commented 3 years ago

Thanks for sharing thoughts @michaelsauter!

Pass an explicit version (such as 1.0.0, 1.1.0, 1.1.1, 1.1.2 etc. ... but also shorter versions such as 1.0 or simply 1 are allowed). This seems to be closer to what people are used to opposed to just using major versions.

Absolutely. In the context of GxP relevant applications, the version name will actually be a Change ID in the form of an alphanumeric string. Anything should work that can fit into a branch name (definitely no whitespaces).

The orchestration pipeline does not create release branches.

This would mean that users would need to manually create release branches across repositories, which might conflict with compliance expectations where an atomic operation that creates an immutable snapshot of the codebase across the entire application is beneficial and helps minimize the risks of undocumented changes when bundling a release.

Avoid having to build a version in DEV.

I see a potential conflict with compliance expectations here. A deployment into QA or PROD has to be traceable to a build and deployment into DEV with the same version (which identifies a change). Possibly, a SHA is good enough. Not generating documentation in DEV is not an option since compliance requires documentation from DEV to be signed before going to QA. Unfortunately, creating documentation in QA and not in DEV is therefore a clear No.

Avoid committing to the component repos.

These commits are essential evidence to mark the point in time in the history of a repository when a release was made and contains traceability artifacts important for an audit. This way, we can trace the creation of an image in OpenShift to a commit in the codebase together with sealed audit evidence. I doubt that we should remove this.

Thoughts @clemensutschig?

martsec commented 3 years ago

We could commit the components hashes in the release manager repository instead of the component's. We are already using that repo for storing state, so would not affect much.

michaelsauter commented 3 years ago

Pass an explicit version (such as 1.0.0, 1.1.0, 1.1.1, 1.1.2 etc. ... but also shorter versions such as 1.0 or simply 1 are allowed). This seems to be closer to what people are used to opposed to just using major versions.

Absolutely. In the context of GxP relevant applications, the version name will actually be a Change ID in the form of an alphanumeric string. Anything should work that can fit into a branch name (definitely no whitespaces).

Remember that we actually have both right now, a version and a change ID. Could this potentially be simplified to just one identifier? Non-GxP would not need a change ID anyway, and for GxP just change ID might be enough?

The orchestration pipeline does not create release branches.

This would mean that users would need to manually create release branches across repositories, which might conflict with compliance expectations where an atomic operation that creates an immutable snapshot of the codebase across the entire application is beneficial and helps minimize the risks of undocumented changes when bundling a release.

Please note that my suggestion makes release branches optional. People would not need to create the branch at all, if they don't need parallel development. On the other hand, if they want to have branches, it would require manual effort (a downside of my suggestion - however I guess this could be automated with a script). Regarding compliance expectations, I'm not worried. Every version still is a snapshot, and a snapshot is always a set of Git commits - branches do not play a role here.

Avoid having to build a version in DEV.

I see a potential conflict with compliance expectations here. A deployment into QA or PROD has to be traceable to a build and deployment into DEV with the same version (which identifies a change). Possibly, a SHA is good enough. Not generating documentation in DEV is not an option since compliance requires documentation from DEV to be signed before going to QA. Unfortunately, creating documentation in QA and not in DEV is therefore a clear No.

Thanks for the explanation, I feared that :( This means we will always run deployment in DEV from the orchestration pipeline, fine.

Avoid committing to the component repos.

These commits are essential evidence to mark the point in time in the history of a repository when a release was made and contains traceability artifacts important for an audit. This way, we can trace the creation of an image in OpenShift to a commit in the codebase together with sealed audit evidence. I doubt that we should remove this.

Actually, my suggestion would increase the traceability. At the moment, the component pipeline is triggered for commit A, which is the base of the container image. However, the orchestration pipeline now creates a commit B which contains the export (if required) and the deployment descriptor file. As @martsec commented, the deployment descriptor could be committed into the release manager repository instead, avoiding a commit into the component repo. The only thing that screws this idea up is the export (which a) I don't like at all and b) could also be stored in the release manager).

metmajer commented 3 years ago

Remember that we actually have both right now, a version and a change ID. Could this potentially be simplified to just one identifier? Non-GxP would not need a change ID anyway, and for GxP just change ID might be enough?

This is how we handle it at the moment. GxP = change ID, non-GxP = whatever. The change ID field parameter is still in use but points to the same value. Technically, it points to the branch name field's value in the Release Status issue, which is now initialized with the version in Jira but non-editable for the user and will be removed in a future version.

Please note that my suggestion makes release branches optional. People would not need to create the branch at all, if they don't need parallel development. On the other hand, if they want to have branches, it would require manual effort (a downside of my suggestion - however I guess this could be automated with a script). Regarding compliance expectations, I'm not worried. Every version still is a snapshot, and a snapshot is always a set of Git commits - branches do not play a role here.

I see parallel development likely to happen considering teams could work on a product increment and an emergency change to fix an incident. I would rather go with more convenience for the end-users as opposed to technical simplicity. Honestly, I need to think a bit more about this. Simple is always better, but I'd always favor simplicity on the end-user side.

Actually, my suggestion would increase the traceability. At the moment, the component pipeline is triggered for commit A, which is the base of the container image. However, the orchestration pipeline now creates a commit B which contains the export (if required) and the deployment descriptor file. As @martsec commented, the deployment descriptor could be committed into the release manager repository instead, avoiding a commit into the component repo. The only thing that screws this idea up is the export (which a) I don't like at all and b) could also be stored in the release manager).

This would mean that traceability data does not exist at the technical component level but at the central Release Manager repository. As an example, I can now look at a container image in OpenShift and identify the Git commit SHA the image is based on for a specific component. Then, when I look at the Git commit in Bitbucket, I can see the release commit follow immediately. This relation would not be present anymore when storing release commits in the Release Manager to my understanding.

michaelsauter commented 3 years ago

This is how we handle it at the moment. GxP = change ID, non-GxP = whatever. The change ID field parameter is still in use but points to the same value. Technically, it points to the branch name field's value in the Release Status issue, which is now initialized with the version in Jira but non-editable for the user and will be removed in a future version.

I'm sorry I think I don't get it. Could you make an example to help me understand?

I see parallel development likely to happen considering teams could work on a product increment and an emergency change to fix an incident. I would rather go with more convenience for the end-users as opposed to technical simplicity. Honestly, I need to think a bit more about this. Simple is always better, but I'd always favor simplicity on the end-user side.

I agree that parallel development is something we should support - that's why we built it like it is in the first place. I'm just not too happy with the "upfront complexity" this brings, which has triggered my idea to simplify the branching model. I need to think more about this.

However, I want to mention another aspect which I came across in the last couple of days. At the moment, we use the latest image tag for each container image in the pod template, and rely on either the image trigger of the DeploymentConfig, or a manual rollout if the trigger does not exist, to rollout a new deployment. This is not ideal as OpenShift recommends to use the standard Kubernetes Deployment resources going forward. Deployment resources cannot have image triggers, which means to rollout a new deployment, you have to update the pod template, which means you cannot really use the latest image tag. Yes, there is functionality in Kubernetes 1.15+ to restart an existing rollout, which has the desired effect, but the functionality is meant to restart a deployment (e.g. to fix memory issues or else), not to create a new one. So we are kind of mis-using the restart functionality.

Now, if it were just for the component pipeline, I would change our approach to using the Git SHA as the image tag, and patch the deployment resources to get them to rollout a new deployment. This sounds like a good fit for Deployment resources, and I also like how explicit it is in that you can easily see which image (=commit) we are running. However, the problem is that the orchestration pipeline creates new commits in the repo, which would lead to a situation where we need to record the actual SHA which should be used as the image tag, and then use that in the orchestration deployment as opposed to using the currently checked out repo SHA (which is the commit SHA after the actual image tag SHA).

Just wanted to bring this up as food for thought - I think it would be nice if the image tag is equal to the commit that built the image, and is also equal to the commit that deployed the image. To achieve this, we would need to give up committing into the component repository.

metmajer commented 3 years ago

This is how we handle it at the moment. GxP = change ID, non-GxP = whatever. The change ID field parameter is still in use but points to the same value. Technically, it points to the branch name field's value in the Release Status issue, which is now initialized with the version in Jira but non-editable for the user and will be removed in a future version.

I'm sorry I think I don't get it. Could you make an example to help me understand?

So, for GxP use-cases, our change management control system defines a Change ID that will propagate from there into Jira as the version for a specific release. Therefore, at least for GxP use-cases, a version will not look like 1.0 but rather as FOO123. Now, I made a mistake in my previous posting, where I stated that the change ID field parameter is still in use. Here, I intended to refer to the branch ID field on a Jira Release Status issue. We made this field read-only and populate it with the Jira release version. The Jenkins pipeline knows two parameters version and changeID, which are derived from the Jira version and the branch ID field value and will always hold the same value.

I see parallel development likely to happen considering teams could work on a product increment and an emergency change to fix an incident. I would rather go with more convenience for the end-users as opposed to technical simplicity. Honestly, I need to think a bit more about this. Simple is always better, but I'd always favor simplicity on the end-user side.

I agree that parallel development is something we should support - that's why we built it like it is in the first place. I'm just not too happy with the "upfront complexity" this brings, which has triggered my idea to simplify the branching model. I need to think more about this.

Thanks, Michael.

However, I want to mention another aspect which I came across in the last couple of days. At the moment, we use the latest image tag for each container image in the pod template, and rely on either the image trigger of the DeploymentConfig, or a manual rollout if the trigger does not exist, to rollout a new deployment. This is not ideal as OpenShift recommends to use the standard Kubernetes Deployment resources going forward. Deployment resources cannot have image triggers, which means to rollout a new deployment, you have to update the pod template, which means you cannot really use the latest image tag. Yes, there is functionality in Kubernetes 1.15+ to restart an existing rollout, which has the desired effect, but the functionality is meant to restart a deployment (e.g. to fix memory issues or else), not to create a new one. So we are kind of mis-using the restart functionality.

Now, if it were just for the component pipeline, I would change our approach to using the Git SHA as the image tag, and patch the deployment resources to get them to rollout a new deployment. This sounds like a good fit for Deployment resources, and I also like how explicit it is in that you can easily see which image (=commit) we are running. However, the problem is that the orchestration pipeline creates new commits in the repo, which would lead to a situation where we need to record the actual SHA which should be used as the image tag, and then use that in the orchestration deployment as opposed to using the currently checked out repo SHA (which is the commit SHA after the actual image tag SHA).

Just wanted to bring this up as food for thought - I think it would be nice if the image tag is equal to the commit that built the image, and is also equal to the commit that deployed the image. To achieve this, we would need to give up committing into the component repository.

I see. I like the approach with using the Git SHA from a traceability point of view as well. Question is: how could we trace from there to the information in the Release Manager repo? Also need to think about this a bit. Scenarios could range from the entire application being rebuilt to only a single component when using the Release Manager.

What I can also see is that the component pipeline currently does not restrict direct deployments into -test and -prod namespaces, although this should not happen. Meanwhile, even non-GxP use-case need to create and approve documents only the Release Manager can create in conjunction with Jira.

michaelsauter commented 3 years ago

Question is: how could we trace from there to the information in the Release Manager repo?

I think this would be impossible if we do not alter the component repo. Only the other way works: in the release manager repo, all Git SHAs of the components would be tracked. I think this should be sufficient: you have one repo that defines your application state for any given version.

What I can also see is that the component pipeline currently does not restrict direct deployments into -test and -prod namespaces, although this should not happen

I believe this to be very hard to circumvent completely. However, if you deploy to an external cluster (e.g. for *-prod) then you need the credentials of a serviceaccount with at least edit permissions there. Typically, the component pipeline does not know those credentials so can't deploy there (and this restriction could also be achieved on the same cluster). That said, it would be easy for the component pipeline to figure those credentials out as it is running on the same Jenkins. I think to ensure this cannot happen, one would need to constrain access to the targets, which would mean not exposing the credentials to Jenkins, which would mean we'd need to implement some sort of pulling from the target cluster.

SimonGolms commented 3 years ago

My2Cent: Try to stick to semantic versioning: https://semver.org/

michaelsauter commented 3 years ago

I think the issue is wider than simply "semver". Further, I believe SemVer has no meaning for "applications". SemVer is a great approach for libraries or services consumed by other things, but I don't really see how it applies to applications.

That said, ODS does not prescribe the version identifier, so you may use SemVer. The question discussed in this issue is more: To which source code reference do you look for a certain (opaque) version identifier?