Provide automated deployment of Azure resources used in end-to-end tests

tomkerkhove commented 2 years ago

Provide automated deployment of Azure resources used in end-to-end tests with Bicep so that things are automated and I'm not the bottleneck (or at least less).

This is because our Azure subscription is not accessible to everyone and should be just a PR away.

tomkerkhove commented 2 years ago

For Promitor I have an azure infrastructure repo to which contributors can PR new resources required for automated testing and is automatically deployed with GitHub Actions.

JorTurFer commented 2 years ago

Is for all infra including AKS or only for resources like EventHub, ServiceBus, etc?

tomkerkhove commented 2 years ago

That's up to us to decide, we can just start with the Azure resources without the cluster if you prefer

JorTurFer commented 2 years ago

My only concert is the time to create/delete and AKS cluster, if we need to do it, we will make test even longer

JorTurFer commented 2 years ago

I'd start with upstreams

JorTurFer commented 2 years ago

~and maybe we should take a look to crossplane, we can use it for more cloud providers if it works with the infra we use. Other point in favor of crosplanne is that we can spawn the infra as test code :)~

JorTurFer commented 2 years ago

sadly, they don't support queues and other resources we need for the moment 😢

tomkerkhove commented 2 years ago

This would not run every test run; only when there are changes to the infrastructure definition

JorTurFer commented 2 years ago

aaaah, your idea is having the infra there all the time, and update it on the fly only when needed. I thought you meant deploying/destroying it during the tests

tomkerkhove commented 2 years ago

Yes, correct.

Doing the latter is more intensive and harder to get right. I think we can avoid that as we don't have the capacity for it.

JorTurFer commented 2 years ago

For this scenario you were right, we can use terraform and manage all the infra from the same place. It requires to store the tfstate in a storage but being stable environments this won't be any problem.

I can start with this during the week if we agree to use terraform for all (I don't know about biceps sorry xD)

I'd create a repo to manage the infra, something like 'keda-infrastructure' or jus 'infrastructure'.

Wdyt?

tomkerkhove commented 2 years ago

Bicep works fine but if you want to use this cross cloud then Terraform is OK. If it's just Azure, just use Bicep IMO.

I'd introduce kedacore/testing-infrastructure for this. I'm happy to help if it's Bicep but haven't used Terraform before so would have to wait until the initial file is there unfortunately.

JorTurFer commented 2 years ago

I have expertise with terraform, so I can create the scaffolding and the initial infrastructure, that's not a problem.

I'm thinking in what infra we have, and IDK if we need to cover AWS now because we create the infra during the e2e test and we delete them after it, so maybe we can go with biceps, but GCP has infra I need to review to check if we should cover it.

I said terraform because it's a single language to manage all the infra, so it's easier for people who doesn't know cloud provider specific language. There is also a bot for terraform that we could use to improve the experience, giving the plan outputs and other stuff https://github.com/runatlantis/atlantis

tomkerkhove commented 2 years ago

Let's use Terraform in that case, we don't want to do a migration later on

JorTurFer commented 2 years ago

I have checked and we can update the secrets by Secrets API, so we can get the terraform outputs and update the secrets directly in the org so they can be automatically managed, on every terraform executions, secrets. OFC this is a draft, we need to go deeper, but it's promising and could improve new infra creation

tomkerkhove commented 2 years ago

In theory, it's just going to spin up new resources and a manual action for secrets is fine IMO; at least for starters.

I don't want that process to mess up our GH secrets :)

JorTurFer commented 2 years ago

The problem here is that secrets should be taken from somewhere in order to put them as secrets. If we go to the cloud provider and take from there, we still need access to Azure Subscription, so the blocker should be there. I won't publish secrets as output in github, so the options are, push them somewhere like a vault all of us we can access or push them directly to GH or any vault and pull them from there in the workflow.

I have checked and there is a azure key vault integration for GH Actions, so we could put all the secrets from terraform directly in the vault and get them in the workload, but in that case, I prefer to use GH Secrets

JorTurFer commented 2 years ago

BTW, We can name them as TF_CURRENT_ENV_NAME to know which of them are self generated and which manually generated. Once we have all of them working, we can just modify the secret we use in the workflows to don't touch current secrets

jeffhollan commented 2 years ago

FYI - opened a ticket with CNCF for access (owner) to an Azure Subscription so we could run these kind of automated workloads where we want. My thinking is we could start small (just spinning up Azure Event Hubs / E2E tests) and start to move more of the workloads over time as we want https://cncfservicedesk.atlassian.net/servicedesk/customer/portal/1/CNCFSD-1422

JorTurFer commented 2 years ago

you are right. for the moment, I'll start creating the scaffolding with a simple resource but with all the elements ready (terraform code/modules with a backend, secret management, docs, etc) and then we can move the services one by one.

To start I have my MVP subscription and once the scaffolding is ready, we can change the SP and use other account for this (MSFT or CNCF account, not to worry).

tomkerkhove commented 2 years ago

@jeffhollan I can already tell you that they will not be able to help you :) I already looked in to this.

Please don't introduce yet-another subscription @JorTurFer and just use the existing one :)

JorTurFer commented 2 years ago

Okey, I said only during the scaffolding, once the things are working I wanted to swap from mine to current (because I have access to the UI to check how it's going and in case of the necessity of deleting something)

jeffhollan commented 2 years ago

I'm naively going through the motions to see where this ends https://github.com/cncf/credits/issues/23

JorTurFer commented 2 years ago

I have one question here, are we going to make public the infra repo or it'll be only internal? I ask because I'm working on it and depending on this, we need to think the CI checks for PRs (terraform checks requires secrets and PRs from forks can't access to secrets directly)

tomkerkhove commented 2 years ago

Yes, it should be public so that every contributor can open a PR imo

JorTurFer commented 1 year ago

I think this is already done as we have moved the infrastructure management to https://github.com/kedacore/testing-infrastructure and it's already public, so any contributor can just open a PR there to create needed resources on Azure but also AWS and GCP (GCP is still in progress)

tomkerkhove commented 1 year ago

Job well done, thanks! 🎉

Can we add this new addition to the contribution guide please?

JorTurFer commented 1 year ago

The e2e readme in keda has a section about e2e infrastructure, and that repo has a readme with a brief description Do you think that contribution guide is better to place it? I can move/duplicate it there

JorTurFer commented 1 year ago

I have created an issue in test-tools repo to add documentation there because we don't have any guide or help

tomkerkhove commented 1 year ago

Thanks a ton! I've noticed that contribution guide has link to test folder as well so we're good to go; thanks!

kedacore / governance

Provide automated deployment of Azure resources used in end-to-end tests #77