Closed kkaempf closed 2 years ago
So Im guessing this is about consolidating everything to use either github workers or cloud workers when needed. This can be done easily for toolkit, not sure about elemental end2end tests as those use VMs to test everything...
- but I think the release job should NOT use the build-host.
It was to speedup the build process, but can use a GH runner instead of s self-hosted one yes.
It was to speedup the build process, but can use a GH runner instead of s self-hosted one yes.
I think we need first to check what are we gonna release as part of the elemental releases. If its just the OCI artifacts then we can just use a github workers as that would take about 5 minutes.
arm64 workers are available in GCE. I created an instance template called elemental-ci-runner-arm64-v2
which contains the bare minimum to support creating VMs that can run the runner. The template has an script attached to install dependencies and has me @davidcassany and @fgiudici keys also injected on the machine created via the template.
The only thing needed after create an instance from that template is to ssh in and download+run the worker service. Tested 1 instance with those steps (available on github -> runners -> add runner) and it results into a worker that runs the build jobs properly.
From my point of view GCE supports our use case for the arm64 workers should we decide to move in there, which I know @ldevulder was interested in.
Price of the machine would be 108$ per month.
Looks like GKE clusters are also available which could be a good way of deploying workers and save money, as the priceis per pod per hour, which seems to be much cheaper than a full vm.
The problem is as usual, we need to set a TOKEN_ID for the github runner and we either add it manually or create automation in a custom image to auto-get the TOKEN_ID. That requires a github PAT on the cluster config but has the potential to allow us to autoscale on times of a lot of traffic to the workers and scale down when there is none...
Azure containers seems to be work the same with AKE.
This options seems to be more expensive (seems like they are more suited to bringing them up and down on demand, i.e not sustained used) And requires development on our side to set it right for bringing those pods on demand.
Azure Arm instances are also available, so its mostly up to us to decide where to move everything. I have no preference one way or another.
@ldevulder could you comment on your preferred cloud operator in case the end2end tests should need to move on down the line? Same with @juadk for the UI tests.
I need to create a new arm64 runner and would like to know in which operator it needs to go :)
@ldevulder could you comment on your preferred cloud operator in case the end2end tests should need to move on down the line? Same with @juadk for the UI tests.
I prefer GCP over to Azure personally. I saw lot of sporadic issues on Azure compared to GCP.
Same to me, I'm not in love with Azure... I would go with GCP as well.
nice, that settles it GCE it is. Thanks folks!
Aws runner has been tear down and GCE runner has been setup. Several jobs have been triggered and all of them passed correctly.
@juadk @ldevulder Im wondering if you folks are gonna deploy the needed VMs for the e2e/UI jobs or am I supposed to do so?
In case you want me to do it, I would need some specs here like OS, vCPU, MEM, Disk space and speed. Cheers!
@juadk @ldevulder Im wondering if you folks are gonna deploy the needed VMs for the e2e/UI jobs or am I supposed to do so?
No, we will take care of this. But as I said to @davidcassany yesterday it's not high priority for me, we still have some E2E tests to (re)add and we have a deadline ;-). I will try to do this maybe in 2 weeks.
ok cool!
FYI I will work on this for E2E tests week 38.
Will be follow in issue https://github.com/rancher/elemental/issues/336.
Our current CI / workers / runners setup is somewhat 'spread' across internal and AWS machines. We should try to have it all in one place and properly documented.