Open bschaatsbergen opened 1 year ago
Does it fail with any directory? Have you tried changing the directory?
How are you deploying Atlantis?
Hi @nitrocode, I haven't tried anything outside of /mnt/gcs yet. I'm deploying it on Cloud Run and using gcsfuse to mount a Cloud Storage Bucket.
Atlantis seems to work fine on Cloud Run but I'm trying to set the data dir to the gcsfuse mount directory for persistent storage.
Interesting deployment! Perhaps the atlantis user in the container does not have access to the directory that it's trying to create a subdirectory in?
Solid point! I'll see if I can mount another directory (trying $HOME/.atlantis now) and I'll dive into the user permissions. Nothing was mentioned in the GCS Fuse documentation regarding the user permissions though.
@nitrocode, I tried mounting in a directory that I have access too, a similar path as atlantis sets by default, but now in a /app
directory and set ATLANTIS_DATA_DIR to: /app/home/atlantis/.atlantis
Likewise for the default path from Atlantis, without setting the ATLANTIS_DATA_DIR
Here I tried to precreate the directory: /home/atlantis/.atlantis and mount gcsfuse to this directory.
It seems to trip over that the directory already exists
Note: the reason that the directory already exists is because I run this:
# Create mount directory for service
mkdir -p $MNT_DIR
echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR
echo "Mounting completed."
Before the docker-entrypoint.sh is ran (which initiates atlantis server
)
Interestingly enough:
Relevant Go doc for MkdirAll:
MkdirAll creates a directory named path, along with any necessary parents, and returns nil, or else returns an error.
...
If path is already a directory, MkdirAll does nothing and returns nil.
Seems like you're very close with the network storage.
Here are some related links
Related links
cc @ademariag @gaahrdner
Closing this as I managed to get around this exact reported issue.. I'll be continuing my journey :)
@bschaatsbergen please post your journey in case others hit the same issue. How did you resolve it?
@nitrocode I think I've reached the same point as @bschaatsbergen and now atlantis fails during git cloning. It seems to me that using gcsfuse might be really painful to use(I'm not saying it's not possible we can make it working). I only wonder if it would be possible to keep pending plans in external store as well (probably it could be stored in Redis?). Additional question, is there anything else which needs to be done to make it truly stateless?
Same here @kamilkrampa,
I seem to have a hardtime understanding why the operation isn't permitted though.
running git clone --branch f/gcsfuse-cloudrun --depth=1 --single-branch https://xxxxxxxx/:<redacted>@github.com/xxxxxxxx/xxxxxxxx.git /app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default: Cloning into '/app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default'...
error: chmod on /app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default/.git/config.lock failed: Operation not permitted
fatal: could not set 'core.filemode' to 'false'
: exit status 128
@kamilkrampa @bschaatsbergen making atlantis stateless is probably the correct way to go. I do not think there is a way to store the plan in an external storage (database or s3 bucket or similar) but that would be a great feature request.
If the clone isn't working, I'm unsure how that can be done in a stateless way unless we used a server + agent model.
related issue https://stackoverflow.com/questions/74913423/error-chmod-on-config-lock-failed-operation-not-permitted
@nitrocode Just to clarify, there is no way to store the plan in an external storage because it's not currently implemented in Atlantis or you think it's not possible to do at all?
Anything is eventually possible but it may require golang changes to the atlantis server.
Maybe you could mount an external file system (like gcsfuse, s3 bucket), create a custom workflow to override the plan to save the planfile to the external system, then apply the planfile.
Or you could customize the container to incorporate some cli command to save the plan somewhere, then download the plan and apply it.
If we can make atlantis completely stateless by storing plans in a remote storage solution, e.g. GCS Bucket or S3 Bucket, I would be happy to open a PR for this (which might take a while though).
What are your thoughts @nitrocode ?
That may require a lot of work. Especially because
There may be other instances where a persistent volume is needed. Making atlantis stateless would need to be done in pieces and probably be gated behind a flag so it doesn't disrupt existing functionality.
Right, I think @kamilkrampa and I need to just figure out why gcsfuse is such a pain. And perhaps look into NFS (Google Filestore).
Anyhow, thanks for the thoughts nitro
And you will have to replace BoltDB with something else that can be locked/shared etc, it can be done but it will not be easy,
On Wed, Dec 28, 2022 at 5:53 PM Bruno Schaatsbergen < @.***> wrote:
Right, I think @kamilkrampa https://github.com/kamilkrampa and I need to just figure out why gcsfuse is such a pain. And perhaps look into NFS (Google Filestore).
Anyhow, thanks for the thoughts nitro
— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2869#issuecomment-1367020444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERHKZ3JRZUZHBBZSWXLWPTVI7ANCNFSM6AAAAAATICLWEA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Any chance for someone to share some code/examples/attempts done on this? I want to try the filestore way @nitrocode @bschaatsbergen
Currently working on cloud run based deployments using rclone.. (i had stopped trying for some time) will update once I get a bit closer
It should be fairly simple to add a new /health route or even override the /healthz check endpoint using an argument
I think that ideally it would be introduce an env var, which will allow to override health endpoint (with fallback to default /healthz
.
Community Note
Overview of the Issue
I'm trying to set the environment variable ATLANTIS_DATA_DIR to a
/mnt/gcs
When atlantis server is invoked, a cache dir and bin dir is supposed to be created, but this fails:Error: initializing server: unable to create dir "/mnt/gcs/bin": mkdir /mnt/gcs: file exists
Reproduction Steps
/mnt/gcs
ATLANTIS_DATA_DIR
to/mnt/gcs
Logs
Error: initializing server: unable to create dir "/mnt/gcs/bin": mkdir /mnt/gcs: file exists
Environment details
Deploying to Cloud Run, using gcsfuse to mount a Cloud Storage Bucket.
Additional Context
Before the docker-entrypoint.sh is ran I run the following:
It seems like that Atlantis trips over that the data dir is already created.