nienbo / cache-buildkite-plugin

Tarball, Rsync & S3 Cache Kit for Buildkite. Supports Linux, macOS and Windows
https://buildkite.com/plugins
MIT License
67 stars 39 forks source link

Permission issues when using with Docker + Yarn #32

Closed bradbarrow closed 3 years ago

bradbarrow commented 3 years ago

Hi folks, thanks for the plugin :)

I'm a bit stuck trying to use this with yarn + the Buildkite docker plugin

First build

Second build

I have tried:

My pipeline is pretty straight forward. The second build immediately has trouble untarring the cache because node modules on the host weren't cleaned up due to permission issues.

cache_plugin_config: &cache_plugin_config
  id: gencer-cache-node-modules
  backend: s3
  key: "v1-cache-{{ runner.os }}-{{ checksum 'yarn.lock' }}"
  restore-keys:
    - 'v1-cache-{{ id }}-{{ runner.os }}-'
    - 'v1-cache-{{ id }}-'
  s3:
    bucket: "buildkite-node-modules-cache"
  paths:
    - node_modules

steps:
  - name: ':jest: Test'
    command:
      - "yarn install"
      - "yarn run test"
    plugins:
      - gencer/cache#v2.4.8: *cache_plugin_config
      - docker#v3.8.0:
          image: "node:14"
gencer commented 3 years ago

Hey @bradbarrow!

Sorry for late reply! It seems you hit the docker user namespace issue. Did you also tried to align your user with buildkite-agent and docker?

Probably need to add this:

/etc/docker/daemon.json:

{
        "userns-remap": "buildkite-agent"
}

Run this to get ids:

$ id buildkite-agent

Note the user id and group id and update below:

/etc/subgid:

buildkite-agent:GID:1

/etc/subuid:

buildkite-agent:UID:1

Don't forget to replace GID and UID with your actual numbers...

Run following:

groupadd -g 100999 docker-buildkite-agent
usermod -aG  docker-buildkite-agent buildkite-agent

<- Why this is required? The reason is, even user ns remapped some builds like golang or c extensions of ruby gems, still gets root user permission. This will prevent it.

Now delete everything in /var/lib/buildkite-plugins/builds/*, restart buildkite-agent and also docker then fire a fresh build. It should sync user/uid gids between docker container and actual filesystem.

Let me know if it works or not!

toothbrush commented 3 years ago

Hi @gencer – thanks for your reply. I'm @bradbarrow's colleague, and i've been looking at this too. When Brad first posted the issue, we were running on our quite bespoke Buildkite agents. To test things, we've spun up a Buildkite stack using their Elastic CI for AWS. It looks like this particular issue is addressed!

I had a look, and just for the record, the things you mention in fact appear to be part of that AMI image.

$ sudo cat /etc/docker/daemon.json 
{
  "storage-driver": "overlay2",
  "userns-remap": "buildkite-agent",
  "registry-mirrors": [
    "https://docker-cache.us-east-1.staging.our.internal.domain"
  ]
}
$ id buildkite-agent
uid=2000(buildkite-agent) gid=2000(buildkite-agent) groups=2000(buildkite-agent),1001(docker)
$ sudo cat /etc/subgid 
buildkite-agent:1001:1
buildkite-agent:100000:65536
$ sudo cat /etc/subuid
buildkite-agent:2000:1
buildkite-agent:100000:65536

I'm less certain about the final commands you mention, creating an extra group with GID 100999 and adding it to the buildkite-agent user. Perhaps the AMI provided by Buildkite indeed does something like that too.

But the short version is, it looks like this has solved our issue, thanks for that! I'll let @bradbarrow close this issue if he's not seeing the Permission denied errors from Buildkite any more. 👍

gencer commented 3 years ago

Hi @toothbrush, I'm glad that all sorted out for you. The last part indeed fix all other permissions when used in custom VMs other than AWS AMI. Feel free to close this issue when needed.

🎉

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.