runatlantis / atlantis

Terraform Pull Request Automation
https://www.runatlantis.io
Other
7.82k stars 1.06k forks source link

gcp: cloudrun: data dir flag fails on existing directory #2869

Open bschaatsbergen opened 1 year ago

bschaatsbergen commented 1 year ago

Community Note


Overview of the Issue

I'm trying to set the environment variable ATLANTIS_DATA_DIR to a /mnt/gcs When atlantis server is invoked, a cache dir and bin dir is supposed to be created, but this fails:

Error: initializing server: unable to create dir "/mnt/gcs/bin": mkdir /mnt/gcs: file exists

Reproduction Steps

Logs

Error: initializing server: unable to create dir "/mnt/gcs/bin": mkdir /mnt/gcs: file exists

Environment details

Deploying to Cloud Run, using gcsfuse to mount a Cloud Storage Bucket.

Additional Context

Before the docker-entrypoint.sh is ran I run the following:

# Create mount directory for service
mkdir -p $MNT_DIR

echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR 
echo "Mounting completed."

It seems like that Atlantis trips over that the data dir is already created.

nitrocode commented 1 year ago

Does it fail with any directory? Have you tried changing the directory?

How are you deploying Atlantis?

bschaatsbergen commented 1 year ago

Hi @nitrocode, I haven't tried anything outside of /mnt/gcs yet. I'm deploying it on Cloud Run and using gcsfuse to mount a Cloud Storage Bucket.

Atlantis seems to work fine on Cloud Run but I'm trying to set the data dir to the gcsfuse mount directory for persistent storage.

nitrocode commented 1 year ago

Interesting deployment! Perhaps the atlantis user in the container does not have access to the directory that it's trying to create a subdirectory in?

bschaatsbergen commented 1 year ago

Solid point! I'll see if I can mount another directory (trying $HOME/.atlantis now) and I'll dive into the user permissions. Nothing was mentioned in the GCS Fuse documentation regarding the user permissions though.

bschaatsbergen commented 1 year ago

@nitrocode, I tried mounting in a directory that I have access too, a similar path as atlantis sets by default, but now in a /app directory and set ATLANTIS_DATA_DIR to: /app/home/atlantis/.atlantis

image
bschaatsbergen commented 1 year ago

Likewise for the default path from Atlantis, without setting the ATLANTIS_DATA_DIR

Here I tried to precreate the directory: /home/atlantis/.atlantis and mount gcsfuse to this directory.

It seems to trip over that the directory already exists

image

Note: the reason that the directory already exists is because I run this:

# Create mount directory for service
mkdir -p $MNT_DIR

echo "Mounting GCS Fuse."
gcsfuse --debug_gcs --debug_fuse $BUCKET $MNT_DIR 
echo "Mounting completed."

Before the docker-entrypoint.sh is ran (which initiates atlantis server)

bschaatsbergen commented 1 year ago

Interestingly enough:

Relevant Go doc for MkdirAll:

MkdirAll creates a directory named path, along with any necessary parents, and returns nil, or else returns an error.

...

If path is already a directory, MkdirAll does nothing and returns nil.

nitrocode commented 1 year ago

Seems like you're very close with the network storage.

Here are some related links

Related links

cc @ademariag @gaahrdner

bschaatsbergen commented 1 year ago

Closing this as I managed to get around this exact reported issue.. I'll be continuing my journey :)

nitrocode commented 1 year ago

@bschaatsbergen please post your journey in case others hit the same issue. How did you resolve it?

kamilkrampa commented 1 year ago

@nitrocode I think I've reached the same point as @bschaatsbergen and now atlantis fails during git cloning. It seems to me that using gcsfuse might be really painful to use(I'm not saying it's not possible we can make it working). I only wonder if it would be possible to keep pending plans in external store as well (probably it could be stored in Redis?). Additional question, is there anything else which needs to be done to make it truly stateless?

bschaatsbergen commented 1 year ago

Same here @kamilkrampa,

I seem to have a hardtime understanding why the operation isn't permitted though.

running git clone --branch f/gcsfuse-cloudrun --depth=1 --single-branch https://xxxxxxxx/:<redacted>@github.com/xxxxxxxx/xxxxxxxx.git /app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default: Cloning into '/app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default'...
error: chmod on /app/atlantis/repos/xxxxxxxx/xxxxxxxx/29/default/.git/config.lock failed: Operation not permitted
fatal: could not set 'core.filemode' to 'false'
: exit status 128
nitrocode commented 1 year ago

@kamilkrampa @bschaatsbergen making atlantis stateless is probably the correct way to go. I do not think there is a way to store the plan in an external storage (database or s3 bucket or similar) but that would be a great feature request.

If the clone isn't working, I'm unsure how that can be done in a stateless way unless we used a server + agent model.

related issue https://stackoverflow.com/questions/74913423/error-chmod-on-config-lock-failed-operation-not-permitted

kamilkrampa commented 1 year ago

@nitrocode Just to clarify, there is no way to store the plan in an external storage because it's not currently implemented in Atlantis or you think it's not possible to do at all?

nitrocode commented 1 year ago

Anything is eventually possible but it may require golang changes to the atlantis server.

Maybe you could mount an external file system (like gcsfuse, s3 bucket), create a custom workflow to override the plan to save the planfile to the external system, then apply the planfile.

Or you could customize the container to incorporate some cli command to save the plan somewhere, then download the plan and apply it.

bschaatsbergen commented 1 year ago

If we can make atlantis completely stateless by storing plans in a remote storage solution, e.g. GCS Bucket or S3 Bucket, I would be happy to open a PR for this (which might take a while though).

What are your thoughts @nitrocode ?

nitrocode commented 1 year ago

That may require a lot of work. Especially because

There may be other instances where a persistent volume is needed. Making atlantis stateless would need to be done in pieces and probably be gated behind a flag so it doesn't disrupt existing functionality.

bschaatsbergen commented 1 year ago

Right, I think @kamilkrampa and I need to just figure out why gcsfuse is such a pain. And perhaps look into NFS (Google Filestore).

Anyhow, thanks for the thoughts nitro

jamengual commented 1 year ago

And you will have to replace BoltDB with something else that can be locked/shared etc, it can be done but it will not be easy,

On Wed, Dec 28, 2022 at 5:53 PM Bruno Schaatsbergen < @.***> wrote:

Right, I think @kamilkrampa https://github.com/kamilkrampa and I need to just figure out why gcsfuse is such a pain. And perhaps look into NFS (Google Filestore).

Anyhow, thanks for the thoughts nitro

— Reply to this email directly, view it on GitHub https://github.com/runatlantis/atlantis/issues/2869#issuecomment-1367020444, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3ERHKZ3JRZUZHBBZSWXLWPTVI7ANCNFSM6AAAAAATICLWEA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ademariag commented 1 year ago

Any chance for someone to share some code/examples/attempts done on this? I want to try the filestore way @nitrocode @bschaatsbergen

bschaatsbergen commented 1 year ago

Currently working on cloud run based deployments using rclone.. (i had stopped trying for some time) will update once I get a bit closer

m0ps commented 1 week ago

https://github.com/runatlantis/atlantis/issues/879#issuecomment-2451306173

nitrocode commented 5 days ago

It should be fairly simple to add a new /health route or even override the /healthz check endpoint using an argument

https://github.com/runatlantis/atlantis/blob/053f494107450ef0f10cf5c6551183b79999a694/server/server.go#L994

m0ps commented 5 days ago

I think that ideally it would be introduce an env var, which will allow to override health endpoint (with fallback to default /healthz.