runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.68k stars 1.05k forks source link

Multiple atlantis instances - Installation using Helm chart #3795

Open bsvartz opened 11 months ago

bsvartz commented 11 months ago

Hey,

We got 30 cloud clusters (Different environments) with ~700 resources. Managing the following environments (parallel - without execution_order_group) with only one atlantis pod cause to timeouts when trying to get to api's used by TF (ex. datadog, timescale, etc.). In addition, the plans and applies are getting slower with each new environment that we add.

We are installing atalntis on GKE Cluster using this Chart: https://github.com/runatlantis/helm-charts

Based on this closed issue - https://github.com/runatlantis/atlantis/issues/1155 i thought that i will be able to configure the chart with multiple instance - using Redis and Shared disks.

I configured the chart to use redis with lockingDbType, redis.db, redis.host but the atlantis chart is creating statefulset that using VolumeClaimTemplate create one pvc per pod and than i can't share volumes between the containers and pods on the statefulset. That cause to pods to work with their own pvc - without syncing data and than the .tfstate files, pull requests data, etc. are not known to each pod in the statefulset - the pods are not really working together.

I checked the chart for more options and couldnt find any solution to share disks between the pods and to work with multiple atlantis intances.

Is it even possible? if not - can you please add it to Chart?

Thanks!!!

jamengual commented 11 months ago

@GMartinez-Sisti do you know if this is possible?

GMartinez-Sisti commented 11 months ago

This might be possible to achieve if locking is already supported by Atlantis.

There are a few requirements:

Hope it helps!

bsvartz commented 11 months ago

@GMartinez-Sisti - seems like its a great first step! - i added in https://github.com/runatlantis/helm-charts/pull/304 some comments. in addition - when it suppose to be merged?

@jamengual - Are you sure that after sharing the disk the atlantis multiple instances will know to work parallel? the atlantis app support this?

Lets say i got external load balancer that points to 5 atlantis pods - if we will share disk it will show the plan / apply on each pod?

Thanks for the quick response!

jamengual commented 11 months ago

no, working on parallel will not work well. Atlantis was not built to have multiple instances. It has been extended to have external locking and such ( you will need to use redis) and is possible to run multiple instances that way but there are some caveats.

you can read some issues of people who have tried this to get an idea.

GMartinez-Sisti commented 11 months ago

Does it work concurrently then? The hooks will only reach one instance at a time, so I assume they will work on different PRs, and the redis lock is to ensure they don't try to do the same?

jamengual commented 11 months ago

Terraform in itself does not support concurrent plans in one system ( try it in your computer) so you will have to work those things out, like plugin cache for example.

GMartinez-Sisti commented 11 months ago

Regarding terraform, I'm aware on that, we also use state locking using dynamodb to ensure no one is working on the same workflows JIC. I meant regarding atlantis, since there is a locking feature for redis, it implies there might be multiple atlantis servers running, so the server needs to ensure no one is trying to work on the same workflows. Right?

jamengual commented 11 months ago

yes that is for the atlantis lock which still is per repo+workspace.

there is a problem with provider cache in TF for parallel runs ( not remote state) that is what I'm referring too.

jamengual commented 11 months ago

https://github.com/runatlantis/atlantis/issues/1571

jamengual commented 11 months ago

https://www.runatlantis.io/docs/server-configuration.html#use-tf-plugin-cache

GMartinez-Sisti commented 11 months ago

1571

Great read, thanks for sharing. So, now I don’t quite understand the initial question you made. Was it just to share the underlying storage, while having only one instance? @jamengual

jamengual commented 11 months ago

the original question for you was related to the shared volume usage in the helm-chart that I do not think is implemented, right?

GMartinez-Sisti commented 11 months ago

the original question for you was related to the shared volume usage in the helm-chart that I do not think is implemented, right?

Correct. Regarding that, my first reply is still valid.