Open snorlaX-sleeps opened 1 year ago
Thanks for sharing these commands. I formatted them and added a description for each one. It might be an easy command to wedge into a custom workflow in a pre workflow hook.
https://www.runatlantis.io/docs/pre-workflow-hooks.html#atlantis-command-targetting
For the find
command that is native, you can use this. I've been testing this out and it's worked well for me.
repos:
- id: /.*/
pre_workflow_hooks:
- description: Clean up old files
commands: plan
run: |
last_accessed_weeks="2"
dir_to_clean="$ATLANTIS_DATA_DIR/plugin-cache/registry.terraform.io"
echo "Clean up old files in $dir_to_clean not accessed in the last $last_accessed_weeks weeks"
# clean up old files
find \
"$dir_to_clean" \
-type f \
-atime +$(($last_accessed_weeks*7)) \
-delete \
-print
# clean up empty dirs
find \
"$dir_to_clean" \
-mindepth 1 \
-type d \
-empty \
-delete
For the find
command mentioned by OP, you need to apk add findutils
because the flag -newerat
is not in the default busybox find
. I'd recommend the above native solution instead.
find \
"$dir_to_clean" \
-mindepth 1 \
-type f \
-not \
-newerat "-$last_accessed_weeks weeks" \
-delete
Community Note
Describe the user story
Using an EBS volume in EKS, Atlantis can be a very long running service. Using version pinning like
~> 4
or~> 3
for providers like AWS (specifically AWS) will result in large numbers of providers being downloaded over time. Eventually several Gb's of providers will be present on disk, even though they are no longer in use. When using a small volume, e.g 5-10Gb for small Terraform repos, this can cause the disk to fill up and eventually cause plans to fail, even though the Terraform repos, with multiple workspaces and folders (10+ projects) only takes < 100Mb of space.Specifically talking about larger providers of here, in the 200Mb+ range.
Creating a cron job to clean this out would be easy with a separate EFS volume for providers, as that allows multi-access, however with a single EBS volumes a sidecar container has to be created in the same pod, which requires building an image.
Describe the solution you'd like
Adding a flag / environment variable that allows setting the location of the plugin-cache, different to the main Atlantis data-dir. As the configuration of the location is the data-directory+constant it should be possible to add an override, similar to how the data-dir can have any location as long as Atlantis has access to it. As this is actually a Terraform variable, rather than an Atlantis variable, it should not affect Atlantis' functionality.
It is a similar request to https://github.com/runatlantis/atlantis/issues/916, however we are more concerned about the plugin cache than the Terraform repos or other Atlantis data, as the provider-cache is the only thing that continuously grows over time.
Describe the drawbacks of your solution
Unsure how a provider cache having a different location would be an issue, as it only affects Terraform rather than Atlantis' functionality.
If using EFS you could effectively share a provider-cache amongst several different Atlantis installations, but then you would be more likely to run into some theoretical issues if multiple start to plan at the same time e.g https://github.com/runatlantis/atlantis/issues/2242
Describe alternatives you've considered
Current issue - Disk space cleanup using a
cron
. As it's only the providers causing the issue, it is only the shared cache that needs cleaned out when they are no longer in use. Repos are deleted whenatlantis unlock
is executed. Current workaround - build a sidecar image withcron
installed (currently using Debian) run in the same pod as Atlantis. Cannot run a separate pod or K8s cron job as they cannot access the EBS volume even when on the same node as Atlantis due to the limitations of the EBS CSI driver.EFS cannot be used as the main Atlantis data dir due to how much slower it is for writing small files, which is basically what a
terraform init && terraform plan
is. "If we just bumped our volume size higher" then EFS would become significantly faster, but it would cost more. Using EBS for a large number of Atlanti' seems the most cost effective way to do it, with good RW speeds, but then clearing / managing the storage becomes slightly more manual.Current cron for context / others
Find files that have not been accessed in the last 2 weeks and remove them from the data directory
Find all empty directories and delete them from the data directory
Related Issues