Create a new flag / environment variable for the shared provider plugin-cache directory

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
If you are interested in working on this issue or have submitted a pull request, please leave a comment.

~I'd be willing to implement this feature (contributing guide)~
[x] I would think about it in the future if I found some time

Describe the user story

Using an EBS volume in EKS, Atlantis can be a very long running service. Using version pinning like ~> 4 or ~> 3 for providers like AWS (specifically AWS) will result in large numbers of providers being downloaded over time. Eventually several Gb's of providers will be present on disk, even though they are no longer in use. When using a small volume, e.g 5-10Gb for small Terraform repos, this can cause the disk to fill up and eventually cause plans to fail, even though the Terraform repos, with multiple workspaces and folders (10+ projects) only takes < 100Mb of space.

Specifically talking about larger providers of here, in the 200Mb+ range.

Creating a cron job to clean this out would be easy with a separate EFS volume for providers, as that allows multi-access, however with a single EBS volumes a sidecar container has to be created in the same pod, which requires building an image.

Describe the solution you'd like

Adding a flag / environment variable that allows setting the location of the plugin-cache, different to the main Atlantis data-dir. As the configuration of the location is the data-directory+constant it should be possible to add an override, similar to how the data-dir can have any location as long as Atlantis has access to it. As this is actually a Terraform variable, rather than an Atlantis variable, it should not affect Atlantis' functionality.

It is a similar request to https://github.com/runatlantis/atlantis/issues/916, however we are more concerned about the plugin cache than the Terraform repos or other Atlantis data, as the provider-cache is the only thing that continuously grows over time.

Describe the drawbacks of your solution

Unsure how a provider cache having a different location would be an issue, as it only affects Terraform rather than Atlantis' functionality.

If using EFS you could effectively share a provider-cache amongst several different Atlantis installations, but then you would be more likely to run into some theoretical issues if multiple start to plan at the same time e.g https://github.com/runatlantis/atlantis/issues/2242

Describe alternatives you've considered

Current issue - Disk space cleanup using a cron. As it's only the providers causing the issue, it is only the shared cache that needs cleaned out when they are no longer in use. Repos are deleted when atlantis unlock is executed. Current workaround - build a sidecar image with cron installed (currently using Debian) run in the same pod as Atlantis. Cannot run a separate pod or K8s cron job as they cannot access the EBS volume even when on the same node as Atlantis due to the limitations of the EBS CSI driver.

EFS cannot be used as the main Atlantis data dir due to how much slower it is for writing small files, which is basically what a terraform init && terraform plan is. "If we just bumped our volume size higher" then EFS would become significantly faster, but it would cost more. Using EBS for a large number of Atlanti' seems the most cost effective way to do it, with good RW speeds, but then clearing / managing the storage becomes slightly more manual.

Current cron for context / others

Find files that have not been accessed in the last 2 weeks and remove them from the data directory

find \
  "$ATLANTIS_DATA_DIR/plugin-cache/registry.terraform.io" \
  -mindepth 1 \
  -type f \
  -not \
  -newerat '-2 weeks' \
  -delete

Find all empty directories and delete them from the data directory

find \
  "$ATLANTIS_DATA_DIR/plugin-cache/registry.terraform.io" \
  -mindepth 1 \
  -type d \
  -empty \
  -delete

Related Issues

https://github.com/runatlantis/atlantis/issues/916

runatlantis / atlantis

Create a new flag / environment variable for the shared provider plugin-cache directory #3238

Community Note