moby / buildkit

concurrent, cache-efficient, and Dockerfile-agnostic builder toolkit
https://github.com/moby/moby/issues/34227
Apache License 2.0
8.08k stars 1.13k forks source link

Unable to download a file with ADD because of `Go-http-client/1.1` user agent #5334

Open edlundin opened 3 years ago

edlundin commented 3 years ago

Description The ADD instruction uses the user agent go-http-client/1.1 when the source is an URL. If for some reason this user agent is blacklisted, downloading a file using ADD becomes impossible.

Context I was trying to bust a cached git repository, cloned from my company's own repositories, using ADD. Unfortunately, my company has a list of banned user agents, including go-http-client/1.1, that prevents me from downloading a file with this instruction. I am aware that several workarounds exist, hence this issue is not a priority, but for this use case, nothing is as simple as using ADD.

Describe the results you received: The build fails with a message similar to failed to load cache key: Get $URL: EOF. Where $URL is the one fed to the src argument of the ADD instruction.

Describe the results you expected: The file to be downloaded by the ADD instruction.

Possible solution: I believe that if there was an optional flag --user-agent, to set the user agent used by ADD, it would fix the issue. Since the flag would be optional, go-http-client/1.1 would still be the default user agent.

Output of docker version:

Client: Docker Engine - Community
 Cloud integration: 1.0.12
 Version:           20.10.5
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        55c4c88
 Built:             Tue Mar  2 20:14:53 2021
 OS/Arch:           windows/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.5
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       363e9a8
  Built:            Tue Mar  2 20:15:47 2021
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  app: Docker App (Docker Inc., v0.9.1-beta3)
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  scan: Docker Scan (Docker Inc., v0.6.0)

Server:
 Containers: 3
  Running: 0
  Paused: 0
  Stopped: 3
 Images: 4
 Server Version: 20.10.5
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.4.72-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 7.771GiB
 Name: docker-desktop
 ID: F35V:M5AT:P6EN:5TI6:GUTN:V7Y5:RPOW:OFYY:KWIF:LYVY:TMUE:ZITH
 Docker Root Dir: /var/lib/docker
 Debug Mode: true
  File Descriptors: 42
  Goroutines: 46
  System Time: 2021-04-24T09:00:52.2080478Z
  EventsListeners: 4
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

Additional environment details (AWS, VirtualBox, physical, etc.): Docker images are mainly built inside WSL2.

thaJeztah commented 3 years ago

Looks to be the same for both buildkit and the classic builder;

On Docker Desktop for mac; In one terminal

docker run -it --rm -p 8080:80 nginx:alpine

In another shell;

DOCKER_BUILDKIT=0 docker build -<<EOF
FROM busybox
ADD http://host.docker.internal:8080 /foo.html
EOF

DOCKER_BUILDKIT=1 docker build -<<EOF
FROM busybox
ADD http://host.docker.internal:8080 /foo.html
EOF

Which prints these in the logs of the nginx container:

172.17.0.1 - - [26/Apr/2021:15:56:11 +0000] "GET / HTTP/1.1" 200 612 "-" "Go-http-client/1.1" "-"
172.17.0.1 - - [26/Apr/2021:15:56:30 +0000] "GET / HTTP/1.1" 200 612 "-" "Go-http-client/1.1" "-"
edlundin commented 3 years ago

I'm trying to write a PR for this feature.

It is my understanding that the ADD instruction is first parsed from a Dockerfile here:

https://github.com/moby/buildkit/blob/70deac12b5857a1aa4da65e90b262368e2f71500/frontend/dockerfile/instructions/parse.go#L239-L254

The parsed data are stored in this struct:

https://github.com/moby/buildkit/blob/70deac12b5857a1aa4da65e90b262368e2f71500/frontend/dockerfile/instructions/commands.go#L183-L193

Then, it's handled by moby via:

https://github.com/moby/moby/blob/797b974cb90e32c7efe9e324d7459122c3f6b7b0/builder/dockerfile/dispatchers.go#L89-L110

Coincidentally, I forked both moby/buildkit and moby/moby, since one parses the instruction and the other acts upon it. I made changes on each fork, but while I can test each project individually, it seems that I can't use my forked buildkit inside my forked moby.

Does anybody know how I could test my implementation? I thought of modifying the imports during the tests, but it's not a realistic solution.

thaJeztah commented 2 weeks ago

I just tried on a build of Docker 27.3 (BuildKit 0.16), but looks like this is still the case, and it's still using the default Go-http-client/1.1 user-agent.

There are some related tickets to make these headers configurable;

But I think it would make sense to at least set some default that's not Go-http-client/1.1 , because I know there's some websites that block that user-agent.

For example Docker's own website doesn't allow;

curl -A 'Go-http-client/1.1' -sI https://www.docker.com/ | head -1
HTTP/2 403

curl -A 'buildkit/0.16' -sI https://www.docker.com/ | head -1
HTTP/2 200

So trying to download a file from the website using ADD will fail;

echo -e 'FROM scratch\nADD https://www.docker.com/ /foo.html\n' | docker build -
[+] Building 0.4s (3/4)                                                                                                                                                                           docker:default
 => [internal] load build definition from Dockerfile                                                                                                                                                        0.0s
 => => transferring dockerfile: 89B                                                                                                                                                                         0.0s
 => [internal] load .dockerignore                                                                                                                                                                           0.0s
 => => transferring context: 2B                                                                                                                                                                             0.0s
 => ERROR [1/1] ADD https://www.docker.com/ /foo.html                                                                                                                                                       0.3s
------
 > [1/1] ADD https://www.docker.com/ /foo.html:
------
ERROR: failed to solve: failed to load cache key: invalid response status 403

Let me move this ticket to the

thaJeztah commented 2 weeks ago

Let me move this to the BuildKit repository

tonistiigi commented 1 week ago

I don't really see how buildkit specific user-agent is any better when doing generic HTTP requests that are not buildkit-specific. In registry requests, it makes sense as we are a known client for registries. In here we are just trying to work around someone's server configuration. If the server wants to block us, it's their choice. And if they want to block something else that makes requests with Go HTTP client, that other tool can trivially work around that blockage with a random/fake user-agent. For any server that needs to have some special behavior for Go HTTP client, sending the correct user-agent reflecting that this library is making the request seems the most correct option.

thaJeztah commented 1 week ago

The default Go user agent is becoming more common to block; similar to some Java user agents being blocked by default https://community.cloudflare.com/t/cloudflare-blocks-java-10-user-agents-by-default/374648

Is there a reason we want to advertise buildkit (or the front end) to be a generic Go application?

tonistiigi commented 1 week ago

Is there a reason we want to advertise buildkit (or the front end) to be a generic Go application?

There might be some privacy concerns but mainly it is just that this is a generic Go library request without anything buildkit specific in there. Go user-agent may be theoretically useful for a server if that client has some behavior specific to the implementation, but buildkit user-agent is meaningless to a random server answering to plain GET request.

The default Go user agent is becoming more common to block; similar to some Java user agents being blocked by default

That is fine and we shouldn't try to outsmart them. Some websites want to target only human users via browsers and that is their choice.

That being said, I don't think this is the most important thing in BuildKit behavior. If you think it is important that a different user-agent should be used for this request, feel free to send a PR.

jedevc commented 6 days ago

IMO, I don't think we should switch the default, but it should be configurable by LLB (I think we've discussed this elsewhere, with the ability to set arbitrary headers).

If there's a reason to switch the default, we should do so in a backwards compat way (using a capability, to avoid breakage, since even though unlikely, some applications/metrics gathering systems may be relying on the current behavior).

crazy-max commented 6 days ago

but it should be configurable by LLB (I think we've discussed this elsewhere, with the ability to set arbitrary headers).

Wonder if we could also have a buildkit conf for http client opts:

[source.http]
  [source.http.headers]
    "User-Agent" = "foo"

Also we have ProxyEnv that works with ExecOp but don't think this is extended to http source.

thompson-shaun commented 2 days ago

The configurable angle is nice but would take a bit more work. Let's start by changing the default with this issue and use a follow-up for the more complex/configurable route.