Permission issues when using with Docker + Yarn

bradbarrow commented 3 years ago

Hi folks, thanks for the plugin :)

I'm a bit stuck trying to use this with yarn + the Buildkite docker plugin

First build

cache plugin runs
- finds no cache on S3
docker plugin runs
- mounts buildkite agent work dir into container (node:14)
- runs yarn install inside the container
cache plugin runs
- pushes node_modules to S3

Second build

cache plugin runs
- finds a cache on S3
- un-tars the cache
- ERROR - the tar can't be un-tarred because the node_modules exist on the host already
- they weren't cleaned up in buildkite checkout step
- because they are owned by root, not the buildkite-agent user
- because they were created in the docker container

I have tried:

using docker plugin's propagate-uid-gid setting so that we act as the buildkite-agent user in the docker container
- this has other issues, where yarn can't create it's ~/.cache dir
using the node:14 image's built in node user
- this has issues on first-run builds because it can't create the node_modules file

My pipeline is pretty straight forward. The second build immediately has trouble untarring the cache because node modules on the host weren't cleaned up due to permission issues.

cache_plugin_config: &cache_plugin_config
  id: gencer-cache-node-modules
  backend: s3
  key: "v1-cache-{{ runner.os }}-{{ checksum 'yarn.lock' }}"
  restore-keys:
    - 'v1-cache-{{ id }}-{{ runner.os }}-'
    - 'v1-cache-{{ id }}-'
  s3:
    bucket: "buildkite-node-modules-cache"
  paths:
    - node_modules

steps:
  - name: ':jest: Test'
    command:
      - "yarn install"
      - "yarn run test"
    plugins:
      - gencer/cache#v2.4.8: *cache_plugin_config
      - docker#v3.8.0:
          image: "node:14"

gencer commented 3 years ago

Hey @bradbarrow!

Sorry for late reply! It seems you hit the docker user namespace issue. Did you also tried to align your user with buildkite-agent and docker?

Probably need to add this:

/etc/docker/daemon.json:

{
        "userns-remap": "buildkite-agent"
}

Run this to get ids:

$ id buildkite-agent

Note the user id and group id and update below:

/etc/subgid:

buildkite-agent:GID:1

/etc/subuid:

buildkite-agent:UID:1

Don't forget to replace GID and UID with your actual numbers...

Run following:

groupadd -g 100999 docker-buildkite-agent
usermod -aG  docker-buildkite-agent buildkite-agent

<- Why this is required? The reason is, even user ns remapped some builds like golang or c extensions of ruby gems, still gets root user permission. This will prevent it.

Now delete everything in /var/lib/buildkite-plugins/builds/*, restart buildkite-agent and also docker then fire a fresh build. It should sync user/uid gids between docker container and actual filesystem.

Let me know if it works or not!

toothbrush commented 3 years ago

Hi @gencer – thanks for your reply. I'm @bradbarrow's colleague, and i've been looking at this too. When Brad first posted the issue, we were running on our quite bespoke Buildkite agents. To test things, we've spun up a Buildkite stack using their Elastic CI for AWS. It looks like this particular issue is addressed!

I had a look, and just for the record, the things you mention in fact appear to be part of that AMI image.

$ sudo cat /etc/docker/daemon.json 
{
  "storage-driver": "overlay2",
  "userns-remap": "buildkite-agent",
  "registry-mirrors": [
    "https://docker-cache.us-east-1.staging.our.internal.domain"
  ]
}

$ id buildkite-agent
uid=2000(buildkite-agent) gid=2000(buildkite-agent) groups=2000(buildkite-agent),1001(docker)

$ sudo cat /etc/subgid 
buildkite-agent:1001:1
buildkite-agent:100000:65536
$ sudo cat /etc/subuid
buildkite-agent:2000:1
buildkite-agent:100000:65536

I'm less certain about the final commands you mention, creating an extra group with GID 100999 and adding it to the buildkite-agent user. Perhaps the AMI provided by Buildkite indeed does something like that too.

But the short version is, it looks like this has solved our issue, thanks for that! I'll let @bradbarrow close this issue if he's not seeing the Permission denied errors from Buildkite any more. 👍

gencer commented 3 years ago

Hi @toothbrush, I'm glad that all sorted out for you. The last part indeed fix all other permissions when used in custom VMs other than AWS AMI. Feel free to close this issue when needed.

🎉

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

nienbo / cache-buildkite-plugin

Permission issues when using with Docker + Yarn #32