runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.74k stars 1.05k forks source link

Highly available cluster with multiple nodes #1571

Open tapaszto opened 3 years ago

tapaszto commented 3 years ago

We are trying to set up a highly available Atlantis cluster with multiple nodes for prod environment and currently testing with two nodes behind a load balancer. In order to have the nodes with the same data/status we deployed Atlantis data folder as a common file share (Azure files) and mounted this share to both nodes, but unfortunately both nodes start to fail and send application exceptions that I attached.

Questions: Can the same set of data files shared among multiple Atlantis server instances as we envisioned? Is this issue due to specific file locking mechanism of Atlantis? Can this issue fixed by any code change or this is not easily achieved by smaller amount of code change. We have the intention to put development effort into it if it is easily achievable. Generally, what is the advise/best practice in order to have a highly available Atlantis environment with multiple nodes?

AtlantisException

johnjelinek commented 9 months ago

I meant: is there a PR upstream here instead of in your fork? I like where your idea is heading, I wonder if @jamengual had a reason to keep this lock separate from the other lock that allows you to configure a backend.

Pardeep009 commented 9 months ago

No, there is no PR in the upstream here.

jamengual commented 9 months ago

I'm did bit coded the original lock implementation

johnjelinek commented 9 months ago

@Pardeep009: maybe it would be good for you to explain in a new issue what your thought process is (you can re-use the points from your blog post). It'd be good to get some feedback from the atlantis engineering team.

jmbravo commented 8 months ago

Hi Folks! I have implemented the multi node setup of atlantis in my organisation and have written a medium blog around the same, hoping this might can help.

@Pardeep009 but this solution is not valid for EKS StatefulSet, because even if you have an EFS, a pvc is created for each replica and they have different volumes