Closed aegershman closed 5 years ago
I agree one way to ease the consumption of releases is a GitHub release which can also act as a notification to consumers.
Tried deploying YW on a fresh cluster today using the same steps I did back on Feb/March-ish. Which were...
https://downloads.yugabyte.com/kubernetes/yugaware-1.0.0.tgz
helm install yugaware-1.0.0.tgz --name yb --set=image.tag=1.1.10.0-b3 --wait
I wanted to do an exact 1-for-1 replication of what we did on March 12th vs. now. But when the page came up, we got 404's
. Prometheus was responding just fine on :9000
, but yugaware itself on :80
had the page rendering but throwing a 404. The nginx container was showing the requests coming in, but still 404ing:
I tried bumping the image.tag
to 1.2.0.0-b7
, and it still had the 404 problem. Then I bumped the image.tag
to what I believe is latest, which is 1.2.8.0-b1
, and things started working.
It doesn't make sense that the yugaware-1.0.0.tgz
with image.tag=1.1.10.0-b3
would suddenly stop working. I did a diff
on the yugaware-1.0.0.tgz
chart I had saved from Feb/March-ish and the one I downloaded today, and there were differences between them:
> proxy_http_version 1.1;
88,91d88
< root /yugaware-ui/public/;
< try_files $uri /index.html;
< }
< location /api {
diff yugaware/templates/rbac.yaml ../../../cernercf-k8s-cluster-monorepo/clusters/aws/spinnaker_sandbox_us-west-2_aws/config/yugaware/yugaware/templates/rbac.yaml
25a26
> - pods/exec
39a41,45
> - apiGroups: ["", "extensions"]
> resources:
> - deployments
> - services
> verbs: ["create", "get", "list", "watch", "update", "delete"]
166,173d169
< lifecycle:
< postStart:
< exec:
< command:
< - 'cp'
< - '-R'
< - '/opt/yugabyte/yugaware/public'
< - '/opt/yugaware-ui'
188,189d183
< - name: yugaware-ui
< mountPath: /opt/yugaware-ui
200,201d193
< - mountPath: /yugaware-ui
< name: yugaware-ui
Problems we found / recommended solutions:
yugaware-1.0.0.tgz
from Feb/March and yugaware-1.0.0.tgz
from today (May 9th) appear to be different. Could this be a fluke?image.tag
associated to the chart made an impact on whether the chart worked. The image version of yugaware used should be versioned, tested, and explicitly packaged with the version of the yugaware chart.provider
configured to use internal LBs because the annotations weren't being respected [#866], this is still an ongoing issue. I'm not sure what the annotations are supposed to be, because there's no way to know the exact version of the yugabyte helm chart yugaware is using under the hood without ssh
'ing onto the yugaware pod/container, untarring the yugaware-latest.tgz
and looking around in it. And even then unfortunately you can't tell with 100% certainty what the values to override are supposed to be.yugaware-1.0.0.tgz
chart and add annotations to allow yugaware's UI LB to have the internal
annotations, but I'll log that as a separate issue.Thanks for your time & consideration
/cc @ajcaldera1 @xyloman @dashaun
@aegershman Hey totally understand all the stuff you mentioned in the github and we will try to address these.
Let me give a quick background into both the helm charts
Yugaware Helm Chart, I agree the versioning on this not done properly, but our plan for PCF users itself is to use the versioning we use in the pivnet tile. I have a PR for a newer version of that but it is waiting on some Open Source licensing documents that we have to provider. In an ideal world you would upgrade the PCF tile, and also download the helm chart which would be shipped as part of that tile.
YugaByte helm chart, the reason we kept it latest, is we didn't want to introduce a new versioning scheme to that, but we were relying on the version of YugaWare to dictate what version of helm chart. Given that is tightly coupled we felt the version on that front is going to cause more confusion, given YugaWare would go through more versioning iterations vs YugaByte helm wouldn't necessary go through the same amount of version changes.
On the issues you faced when you upgraded the YugaWare image without the Helm upgrade, it a legit miss-communication on our front that it was a breaking change and ideally we should have bumped the version tag on YugaWare. I would take full responsibility on this front that I didn't communicate well with @ajcaldera1 before hand on this breaking change I did.
Next steps: Even before this github issue we have internally given enough thought into the version scheme for the helm charts and will try to address this issue as quickly as possible.
Version of YugaWare and YugaByte has been done and we have a charts repository as well, which folks can use. And it is also documented here in our docs https://docs.yugabyte.com/latest/deploy/kubernetes/helm-chart/#add-charts-repository.
We will setup a quick call to go over our internal release process around this.
In order to operationalize k8s deployments, the yugabyte and yugaware charts should be released with specific versions. Currently the
Chart.yaml
's version is pinned tolatest
. You cannot discern from one change to another; there is no way to determine which version of code is present in any given environment.problems introduced
Auditing and compliance. You cannot trace the set of changes going through
sandbox
todev
toproduction
, because thelatest
version can change between applying the chart in different environments. There isn't a way to determine the "real" version deployed for each service or in each environment (except by the chart'sgit sha
), because all deployments will be listed as "latest". There's no clear way to articulate breaking changes, bugfixes, caveats,values.yml
changes, etc. between versions. This makes leveraging the YB/YW charts a compliance difficulty, especially in the face of auditing standards like SOC2General due diligence. Surely problems are going to happen, which is completely understandable and to be expected. But if I had to explain to my org's VP (or higher) that millions of dollars of customer data was lost (or extended downtime) because a YB universe was wiped out-- or YW was completely fried and caused loss of multiple YB universes-- because of taking a
latest
chart upgrade, and they asked me "what version of code introduced this bug?" and I said "I couldn't tell you", it would probably get me fired 😉More difficulty when articulating changes which introduce problems. Without semver'd charts, the only way to articulate changes which caused problems is to search through the git history & go to the exact git sha that introduced those changes. If something goes wrong, not having any versioning will make recovery harder && articulating the problems back to YB support harder.
Difficulty when triggering automation. Without a versioned remote helm repo, or github releases being cut, the only other foreseeable way to setup automation is triggering off every
git sha
change in theyugabyte-db/cloud/kubernetes/helm/
directory of this repository.solutions
Chart.yaml
changes. Or if there are still way too many changes happening and you don't want to tightly adhere to semver yet, cut0.x.y
releases. E.g., every release increases either theminor
orpatch
version, even if it's not backwards compatible, but as long asmajor
is below0
it's flexible:closing
Having to version things & follow semver spec kind of sucks. But it's hard to imagine a world where the YB/YW charts don't eventually start using some kind of semver for releases.
thanks for your time and consideration 👍thoughts?