Closed kingdonb closed 5 years ago
I rebased this with some --onto
and --rebase-merges
magic, so it wouldn't include any reverse-merges back from master into the wip branch.
We can probably also take this to a new branch and do some commit squashing to improve the history, for the changelog.
Tested this on an EKS cluster running 1.13, verified as working.
@kingdonb , I would really like to improve the commit messages. I'm going to take your commits and put them on top of https://github.com/teamhephy/controller/pull/106 and merge them there. I will test again and merge everything once the tests pass. We probably want to squash some of these commits as well, the ones in the beginning (reverts).
This PR is a WIP
We've been testing changes to make the Autoscale functionality via HPA work again.
Right now it looks like this image (
quay.io/kingdonb/controller:kb-fix-autoscale-take-5
) is successfully permitting a manual scale as well as an HPA to be created, like:The concession we had to make is that we don't know what has changed with manual scaling in
apps/v1
deployment, so we have left it atextensions/v1beta1
API for deployments.The
autoscaler/v1
HorizontalPodAutoscaler will only find the Deployment as a match if it is also provided with ascaleTargetRef.apiVersion
selector that matches. I added that here, and HPA now succeeds, assuming you have metrics-server properly configured so the HPA can get pod metrics to make scale decisions.I also tested the manual
deis ps:scale -a testing-hephy cmd=1
behavior, which was broken by the last attempt we made to upgrade toautoscaler/v1
. We should make any reasonable efforts to upgrade the deployment toapps/v1
, but that is not necessary in order to fix the autoscale feature (without unfortunately also breaking manual scaling.) This PR seems to do both 👍!The other frankly unacceptable concession I had to make was in
rootfs/api/models/app.py
where the HPA is created, I have hardcoded'apiVersion': 'extensions/v1beta1'
as I'm not sure how the App resource is supposed to find out what the Deployment's api_version ought to be.We discussed maybe adding an API version service that you can use from any module to determine what API versions are in use for what objects, but there is more discussion that needs to happen before it's clear what that piece ought to look like.
Also, since upgrading a resource's API version means necessarily destroying the old resource, we are going to need a bigger plan when it comes to upgrading deployments to
apps/v1
, since that step appears to be unavoidable, we might need to write code so that a cluster can recognize when these deployment upgrades are needed, and to handle them in a reasonable way. I'm not sure how this can be done without potentially causing apps to have downtime.(It should be perfectly safe to delete and recreate HPAs though, so for the upgrade guide, maybe we give that a little mention and this can go to press relatively as-is in the next release.)