Versioning of YB/YW charts

aegershman commented 5 years ago

In order to operationalize k8s deployments, the yugabyte and yugaware charts should be released with specific versions. Currently the Chart.yaml's version is pinned to latest. You cannot discern from one change to another; there is no way to determine which version of code is present in any given environment.

problems introduced

Auditing and compliance. You cannot trace the set of changes going through sandbox to dev to production, because the latest version can change between applying the chart in different environments. There isn't a way to determine the "real" version deployed for each service or in each environment (except by the chart's git sha), because all deployments will be listed as "latest". There's no clear way to articulate breaking changes, bugfixes, caveats, values.yml changes, etc. between versions. This makes leveraging the YB/YW charts a compliance difficulty, especially in the face of auditing standards like SOC2
General due diligence. Surely problems are going to happen, which is completely understandable and to be expected. But if I had to explain to my org's VP (or higher) that millions of dollars of customer data was lost (or extended downtime) because a YB universe was wiped out-- or YW was completely fried and caused loss of multiple YB universes-- because of taking a latest chart upgrade, and they asked me "what version of code introduced this bug?" and I said "I couldn't tell you", it would probably get me fired 😉
More difficulty when articulating changes which introduce problems. Without semver'd charts, the only way to articulate changes which caused problems is to search through the git history & go to the exact git sha that introduced those changes. If something goes wrong, not having any versioning will make recovery harder && articulating the problems back to YB support harder.
Difficulty when triggering automation. Without a versioned remote helm repo, or github releases being cut, the only other foreseeable way to setup automation is triggering off every git sha change in the yugabyte-db/cloud/kubernetes/helm/ directory of this repository.

solutions

Introduce semantic versioning for Chart.yaml changes. Or if there are still way too many changes happening and you don't want to tightly adhere to semver yet, cut 0.x.y releases. E.g., every release increases either the minor or patch version, even if it's not backwards compatible, but as long as major is below 0 it's flexible:

Major version zero (0.y.z) is for initial development. Anything may change at any time. The public API should not be considered stable.

Introduce some sort of changelog or way to indicate to downstream consumers the differences between releases.

closing

Having to version things & follow semver spec kind of sucks. But it's hard to imagine a world where the YB/YW charts don't eventually start using some kind of semver for releases.

thanks for your time and consideration 👍thoughts?

xyloman commented 5 years ago

I agree one way to ease the consumption of releases is a GitHub release which can also act as a notification to consumers.

aegershman commented 5 years ago

Tried deploying YW on a fresh cluster today using the same steps I did back on Feb/March-ish. Which were...

install the storageclass, yugaware-secret, tiller, etc.
download yugaware fromhttps://downloads.yugabyte.com/kubernetes/yugaware-1.0.0.tgz
install the chart with helm install yugaware-1.0.0.tgz --name yb --set=image.tag=1.1.10.0-b3 --wait

I wanted to do an exact 1-for-1 replication of what we did on March 12th vs. now. But when the page came up, we got 404's. Prometheus was responding just fine on :9000, but yugaware itself on :80 had the page rendering but throwing a 404. The nginx container was showing the requests coming in, but still 404ing:

I tried bumping the image.tag to 1.2.0.0-b7, and it still had the 404 problem. Then I bumped the image.tag to what I believe is latest, which is 1.2.8.0-b1, and things started working.

It doesn't make sense that the yugaware-1.0.0.tgz with image.tag=1.1.10.0-b3 would suddenly stop working. I did a diff on the yugaware-1.0.0.tgz chart I had saved from Feb/March-ish and the one I downloaded today, and there were differences between them:

>       proxy_http_version 1.1;
88,91d88
<         root /yugaware-ui/public/;
<         try_files $uri /index.html;
<       }
<       location /api {
diff yugaware/templates/rbac.yaml ../../../cernercf-k8s-cluster-monorepo/clusters/aws/spinnaker_sandbox_us-west-2_aws/config/yugaware/yugaware/templates/rbac.yaml
25a26
>   - pods/exec
39a41,45
> - apiGroups: ["", "extensions"]
>   resources:
>   - deployments
>   - services
>   verbs: ["create", "get", "list", "watch", "update", "delete"]
166,173d169
<           lifecycle:
<             postStart:
<               exec:
<                 command:
<                   - 'cp'
<                   - '-R'
<                   - '/opt/yugabyte/yugaware/public'
<                   - '/opt/yugaware-ui'
188,189d183
<           - name: yugaware-ui
<             mountPath: /opt/yugaware-ui
200,201d193
<           - mountPath: /yugaware-ui
<             name: yugaware-ui

outcomes:

Problems we found / recommended solutions:

The content of yugaware-1.0.0.tgz from Feb/March and yugaware-1.0.0.tgz from today (May 9th) appear to be different. Could this be a fluke?
The image.tag associated to the chart made an impact on whether the chart worked. The image version of yugaware used should be versioned, tested, and explicitly packaged with the version of the yugaware chart.
Once the version of yugaware was working, I was unable to get the provider configured to use internal LBs because the annotations weren't being respected [#866], this is still an ongoing issue. I'm not sure what the annotations are supposed to be, because there's no way to know the exact version of the yugabyte helm chart yugaware is using under the hood without ssh'ing onto the yugaware pod/container, untarring the yugaware-latest.tgz and looking around in it. And even then unfortunately you can't tell with 100% certainty what the values to override are supposed to be.
We needed to untar the yugaware-1.0.0.tgz chart and add annotations to allow yugaware's UI LB to have the internal annotations, but I'll log that as a separate issue.

Thanks for your time & consideration

/cc @ajcaldera1 @xyloman @dashaun

ramkumarvs commented 5 years ago

@aegershman Hey totally understand all the stuff you mentioned in the github and we will try to address these.

Let me give a quick background into both the helm charts

Yugaware Helm Chart, I agree the versioning on this not done properly, but our plan for PCF users itself is to use the versioning we use in the pivnet tile. I have a PR for a newer version of that but it is waiting on some Open Source licensing documents that we have to provider. In an ideal world you would upgrade the PCF tile, and also download the helm chart which would be shipped as part of that tile.
YugaByte helm chart, the reason we kept it latest, is we didn't want to introduce a new versioning scheme to that, but we were relying on the version of YugaWare to dictate what version of helm chart. Given that is tightly coupled we felt the version on that front is going to cause more confusion, given YugaWare would go through more versioning iterations vs YugaByte helm wouldn't necessary go through the same amount of version changes.

On the issues you faced when you upgraded the YugaWare image without the Helm upgrade, it a legit miss-communication on our front that it was a breaking change and ideally we should have bumped the version tag on YugaWare. I would take full responsibility on this front that I didn't communicate well with @ajcaldera1 before hand on this breaking change I did.

Next steps: Even before this github issue we have internally given enough thought into the version scheme for the helm charts and will try to address this issue as quickly as possible.

ramkumarvs commented 5 years ago

Version of YugaWare and YugaByte has been done and we have a charts repository as well, which folks can use. And it is also documented here in our docs https://docs.yugabyte.com/latest/deploy/kubernetes/helm-chart/#add-charts-repository.

We will setup a quick call to go over our internal release process around this.

yugabyte / yugabyte-db