zenml-io / mlstacks

A series of Terraform based recipes to provision popular MLOps stacks on the cloud.
https://mlstacks.zenml.io/
Apache License 2.0
247 stars 32 forks source link

Create self-hosted runner for integration(-ish) CI tests #75

Closed safoinme closed 8 months ago

safoinme commented 1 year ago

Introduction

This pull request (PR) addresses a long-standing challenge we've encountered with the K3d stack recipe. Specifically, our previous testing process on GitHub Actions fell short due to the resource-intensive nature of provisioning a K3d cluster and installing various applications.

To overcome this hurdle, we've introduced a solution leveraging GitHub's Self-hosted runners. These self-hosted runners grant us the flexibility to execute GitHub Actions workloads within our own custom environments, offering greater control and adaptability.

However, we are mindful of cost considerations and the environmental impact of maintaining VMs that run continuously. To address this, we've integrated Terraform into our workflow. With Terraform, we can dynamically provision VMs only when needed for testing purposes and efficiently de-provision them once testing is complete.

This PR represents a significant improvement in our testing infrastructure, allowing us to ensure the reliability and performance of the K3d stack recipe without incurring unnecessary costs or resource wastage. We look forward to your feedback and collaboration to further enhance our development process.

A full detailed document about this can be found here

safoinme commented 1 year ago

@strickvl Regarding the questions:

strickvl commented 1 year ago

@strickvl Regarding the questions:

  • What tests exactly? if we talking about calling the provisioning of and destruction of resources. They were not called because didn't know what tests we would want to run on the environment exactly.

I'd suggest you add one way to indicate how you think this should be used.

  • We can have them all in one workflow, However, the job that will be running the test must be changed to runs-on: self-hosted

Yeah it just felt a bit weird to have them running in separate workflows.

Also followup questions:

safoinme commented 1 year ago

@strickvl To address the questions:

strickvl commented 1 year ago

@safoinme the runner doesn't seem to run, however. Something seems missing? or I'm not sure what's going on.

safoinme commented 1 year ago

@strickvl Yes, I was looking for the reason this morning it turns out that our token got invalidated because it wasn't used for so long, now we need to generate a new one. This is a big problem that I don't think we have a potential solution for unfortunately because there is no API to token generation, so if this happened we need to generate it manually and set it in the VM config

strickvl commented 8 months ago

Now that we know how to do the self-hosted runners, should we close this branch? We have a ticket to implement integration tests which we can separately do. @safoinme WDYT?

safoinme commented 8 months ago

I agree let's close this

safoinme commented 8 months ago

Now we have self-hosted runners implemented with ARC on an organization level.