Closed tomkerkhove closed 1 year ago
For Promitor I have an azure infrastructure repo to which contributors can PR new resources required for automated testing and is automatically deployed with GitHub Actions.
Is for all infra including AKS or only for resources like EventHub, ServiceBus, etc?
That's up to us to decide, we can just start with the Azure resources without the cluster if you prefer
My only concert is the time to create/delete and AKS cluster, if we need to do it, we will make test even longer
I'd start with upstreams
~and maybe we should take a look to crossplane, we can use it for more cloud providers if it works with the infra we use. Other point in favor of crosplanne is that we can spawn the infra as test code :)~
sadly, they don't support queues and other resources we need for the moment 😢
This would not run every test run; only when there are changes to the infrastructure definition
aaaah, your idea is having the infra there all the time, and update it on the fly only when needed. I thought you meant deploying/destroying it during the tests
Yes, correct.
Doing the latter is more intensive and harder to get right. I think we can avoid that as we don't have the capacity for it.
For this scenario you were right, we can use terraform and manage all the infra from the same place. It requires to store the tfstate in a storage but being stable environments this won't be any problem.
I can start with this during the week if we agree to use terraform for all (I don't know about biceps sorry xD)
I'd create a repo to manage the infra, something like 'keda-infrastructure' or jus 'infrastructure'.
Wdyt?
Bicep works fine but if you want to use this cross cloud then Terraform is OK. If it's just Azure, just use Bicep IMO.
I'd introduce kedacore/testing-infrastructure
for this. I'm happy to help if it's Bicep but haven't used Terraform before so would have to wait until the initial file is there unfortunately.
I have expertise with terraform, so I can create the scaffolding and the initial infrastructure, that's not a problem.
I'm thinking in what infra we have, and IDK if we need to cover AWS now because we create the infra during the e2e test and we delete them after it, so maybe we can go with biceps, but GCP has infra I need to review to check if we should cover it.
I said terraform because it's a single language to manage all the infra, so it's easier for people who doesn't know cloud provider specific language. There is also a bot for terraform that we could use to improve the experience, giving the plan outputs and other stuff https://github.com/runatlantis/atlantis
Let's use Terraform in that case, we don't want to do a migration later on
I have checked and we can update the secrets by Secrets API, so we can get the terraform outputs and update the secrets directly in the org so they can be automatically managed, on every terraform executions, secrets. OFC this is a draft, we need to go deeper, but it's promising and could improve new infra creation
In theory, it's just going to spin up new resources and a manual action for secrets is fine IMO; at least for starters.
I don't want that process to mess up our GH secrets :)
The problem here is that secrets should be taken from somewhere in order to put them as secrets. If we go to the cloud provider and take from there, we still need access to Azure Subscription, so the blocker should be there. I won't publish secrets as output in github, so the options are, push them somewhere like a vault all of us we can access or push them directly to GH or any vault and pull them from there in the workflow.
I have checked and there is a azure key vault integration for GH Actions, so we could put all the secrets from terraform directly in the vault and get them in the workload, but in that case, I prefer to use GH Secrets
BTW, We can name them as TF_CURRENT_ENV_NAME
to know which of them are self generated and which manually generated. Once we have all of them working, we can just modify the secret we use in the workflows to don't touch current secrets
FYI - opened a ticket with CNCF for access (owner) to an Azure Subscription so we could run these kind of automated workloads where we want. My thinking is we could start small (just spinning up Azure Event Hubs / E2E tests) and start to move more of the workloads over time as we want https://cncfservicedesk.atlassian.net/servicedesk/customer/portal/1/CNCFSD-1422
you are right. for the moment, I'll start creating the scaffolding with a simple resource but with all the elements ready (terraform code/modules with a backend, secret management, docs, etc) and then we can move the services one by one.
To start I have my MVP subscription and once the scaffolding is ready, we can change the SP and use other account for this (MSFT or CNCF account, not to worry).
@jeffhollan I can already tell you that they will not be able to help you :) I already looked in to this.
Please don't introduce yet-another subscription @JorTurFer and just use the existing one :)
Okey, I said only during the scaffolding, once the things are working I wanted to swap from mine to current (because I have access to the UI to check how it's going and in case of the necessity of deleting something)
I'm naively going through the motions to see where this ends https://github.com/cncf/credits/issues/23
I have one question here, are we going to make public the infra repo or it'll be only internal? I ask because I'm working on it and depending on this, we need to think the CI checks for PRs (terraform checks requires secrets and PRs from forks can't access to secrets directly)
Yes, it should be public so that every contributor can open a PR imo
I think this is already done as we have moved the infrastructure management to https://github.com/kedacore/testing-infrastructure and it's already public, so any contributor can just open a PR there to create needed resources on Azure but also AWS and GCP (GCP is still in progress)
Job well done, thanks! 🎉
Can we add this new addition to the contribution guide please?
The e2e readme in keda has a section about e2e infrastructure, and that repo has a readme with a brief description Do you think that contribution guide is better to place it? I can move/duplicate it there
I have created an issue in test-tools repo to add documentation there because we don't have any guide or help
Thanks a ton! I've noticed that contribution guide has link to test folder as well so we're good to go; thanks!
Provide automated deployment of Azure resources used in end-to-end tests with Bicep so that things are automated and I'm not the bottleneck (or at least less).
This is because our Azure subscription is not accessible to everyone and should be just a PR away.