teamhephy / controller

Hephy Workflow Controller (API)
https://teamhephy.com
MIT License
14 stars 26 forks source link

chore(controller): permit umbrella override api group #111

Closed kingdonb closed 3 years ago

kingdonb commented 4 years ago

This is the last PR to submit in the series upgrading the charts for K8s 1.16 in Hephy Workflow v2.23, unless I missed one.

This probably should not be merged without validating the e2e suite, I think that does not need any upgrade though because it does not deploy, only creates a Pod which was already at v1 quite some time ago.

I believe that this will not work without further RBAC changes that must be made in a later commit. I have not done enough research to say what must change exactly. There are also further changes needed in router based on my testing, but again I can't say exactly what is needed just yet.

See teamhephy/monitor#14 for more details about the change, which is the first PR opened in the series. The full list of associated PRs is as follows:

requires teamhephy/monitor#14 requires teamhephy/minio#8 requires teamhephy/logger#8 requires teamhephy/fluentd#14 requires teamhephy/postgres#14 requires teamhephy/builder#52 requires teamhephy/nsq#9 requires teamhephy/redis#3 requires teamhephy/registry-proxy#2 requires teamhephy/registry-token-refresher#4 requires teamhephy/registry#5 requires teamhephy/router#44 requires teamhephy/workflow-manager#5 ~requires teamhephy/workflow#105~

kingdonb commented 4 years ago

So, when we talked about this PR, we discussed that maybe it shouldn't use a global, but a template instead (like we have done for rbacAPIVersion) and that maybe it shouldn't require the user to set a parameter, but autodetected (like we did with .Capabilities.APIVersions.Has for rbacAPIVersion).

I am going to take a stab at updating these PRs this morning so that it does that instead. I will submit it as a separate branch/separate series of PRs, so we can choose between them. This one at least apparently functions, as a reference, for what needs to change in the charts across all of the repos.

I would like to explore using Atomist.io to make changes like this that require a similar patch across all of the repos, but I don't have time to explore it right now. But if we find there is a need to make more patches that affect every component, it might be a major time saver to use it.

kingdonb commented 4 years ago

Well, I changed my mind and just updated these PRs in place, since it was a lot less work.

I am going to try to do a test of this now. Once it is tested and I can certify that I haven't made any errors, we should think about publishing a workflow-beta release anyway even if we can't isolate the remaining issues, it will be easier to debug them from a published release that doesn't work than try to cobble together all of the changed pieces.

This is really the problem that e2e was created to solve I think, but I don't know how to supply it with more than a handful (this series of PRs) without the elaborate Jenkins system that I think it's safe to say we have left behind at this point.

kingdonb commented 4 years ago

This chart update should work without teamhephy/workflow#105 now, btw.

Cryptophobia commented 4 years ago

Looks good! I prefer this way much better without a global parameter. Each component should be able to be deployed individually if possible as well.

felixbuenemann commented 4 years ago

I've not been following the 1.16 update story to closely lately, 'cause I was busy with other stuff.

Can someone bring me up to speed on how I would deploy this and the updated dependencies to a cluster for testing?

Would it make sense to have some kind of pre-release for that?

kingdonb commented 4 years ago

Yes, definitely. We should push out a beta release as soon as we can be sure which components need a new tag.

To my knowledge the only image that has been updated for 1.16 is Controller, and I am not certain that it is finished. I'm not 100% sure whether the error I saw came from our updated Python code in controller or another source, as I am not really competent with a Python debugger.

I'm also not 100% sure how to roll the charts as for a new release, to make it easy to test. Usually what we would do here, is publish a chart release to the "workflow-beta" repo. I have actually done the test of all these chart patches manually instead, as in: taken the previous release and fetch --untar followed by applying patches from the baseline of each component manually, to perform my test

(Which failed, although all of the components installed successfully as apps/v1 resources, and all came online, I could not deploy an app. I have not tested again since I modified the chart to use a template instead of a global variable from the umbrella chart. If it is easy to publish a "workflow-beta" release, or if someone knows a less error-prone way to put one together, it would probably be Anton, either that or I'm digging through that Jenkins jobs repo again.)

The process I followed was to take my updated charts dir, and simply add the controller image to it. Then create an app from "example-go" and try to push a revision out. The issues I had are documented at teamhephy/controller#110, comment here: issuecomment-546707022 Like I may have said already, there may be other components that needed updates within the image besides controller, but off the top of my head I wouldn't be sure which (or why.)

Cryptophobia commented 4 years ago

@felixbuenemann

Can someone bring me up to speed on how I would deploy this and the updated dependencies to a cluster for testing?

We need to make individual changes to the component charts and deploy. Hopefully I can get some free time soon to make a workflow-beta umbrella chart with all of the chart changes and a new test image version for the controller.

Would it make sense to have some kind of pre-release for that?

@kingdonb :

If it is easy to publish a "workflow-beta" release, or if someone knows a less error-prone way to put one together, it would probably be Anton, either that or I'm digging through that Jenkins jobs repo again.)

I plan to do this soon. I'll msg you guys when it is ready to go and your PRs are merged in @kingdonb .

...here may be other components that needed updates within the image besides controller, but off the top of my head I wouldn't be sure which (or why.)

I am guessing it may be the deis-router but that is only a guess right now.

kingdonb commented 4 years ago

Great! Thursday is "First Thursday" and we can have the regular meeting, if you guys want to attend we can talk about it then.

I am almost certain that Router is not the only problem left on the table, as I was testing this in Experimental Native Ingress mode, I am virtually certain I did not have the router component running at all.

kingdonb commented 4 years ago

I will get ahold of the controller logs before Thursday, since they should contain a backtrace with more information

Cryptophobia commented 4 years ago

@kingdonb , I merged all the PRs for the chart changes. Should we go ahead and assemble a workflow-beta master chart but with all of the old images for the components. The controller we will change to to the new tag for this commit with these changes inside the workflow-beta master chart. Then we can deploy to k8s 1.16+ cluster and run workflow e2e?

Or do you think we should proceed differently?

kingdonb commented 4 years ago

Sounds good. If you can throw up a beta chart with or without the controller in it, I will begin testing and upgrade the components one by one (starting with the controller, which is most likely I think to still have issues that are undocumented)

kingdonb commented 4 years ago

I just noticed this was not going to merge cleanly without a rebase first. (Rebased!)

I'm stitching things together by hand to perform the test, so my results are preliminary only right now, but so far so good.

Cryptophobia commented 3 years ago

@kingdonb , can you rebase with master branch to see if we need to include something from this PR now that we merged https://github.com/teamhephy/controller/pull/110 ?