Open amosomokpo opened 5 days ago
Q: Is terraform our best option or is there a solution with github actions? (And of terraform is best can we use opentofu instead)
A: If we are targeting multiple clouds, yes terraform is the best option. Not all cloud providers will have a GitHub action but all of them will more likely have a terraform provider. Opentofu sounds great, that’s the best option since it’s completely open source. I’ll look into their support for embedding in GitHub actions.
@yonch Is multiple clouds a requirements? Seems like only support for intel an AMD is a requirement at this point?
Yes. 👍
I think we can have a lighter-weight check to verify support in different cloud providers (as in issue #3), and creates a more extensive test suite in a single provider.
The title of this issue currently says "GitHub actions pipeline to build, test and deploy collector".
This seems like the most important issue at this point? And I think we might want to limit the scope here, otherwise this seems like a bit of a big bite to take on.
To understand what we want to build, we were thinking first trying out the Telegraf Intel RDT plugin. I think we could use whatever Docker image is publicly available, or semi-manually make one and push to e.g., Docker hub. So we can remove that from our plate here.
Issue #2 deals with benchmark workloads.
So what remains is, we need a way to trigger tests on a bare metal machine, or large VM that supports resctrl
.
Here is an idea, following GitHub's "Autoscaling with self-hosted runners":
I believe we can add a nodeSelector in the AutoscalingRunnerSet from the values.yaml when deploying the controller (under template.spec). So this might require a controller deployment per node type.
This does not describe how to run the test workload on the test node, only how to provision and autoscale nodes.
Brainstorming ideas:
Depending on the target cloud, we should look into a couple of Github actions. See action for terraform that can both provision and reclaim the test nodes or more cloud-specific actions (https://github.com/google-github-actions/setup-gcloud). We need an eventing/webhook action to listen and trigger the VM/Baremetal reclamation workflow. See - https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/events-that-trigger-workflows
Either way, the CI/CD pipeline deserves its issue, too. I will create one.
Originally posted by @amosomokpo in https://github.com/perfpod/memory-collector/issues/2#issuecomment-2460284150