Open edlundin opened 3 years ago
Looks to be the same for both buildkit and the classic builder;
On Docker Desktop for mac; In one terminal
docker run -it --rm -p 8080:80 nginx:alpine
In another shell;
DOCKER_BUILDKIT=0 docker build -<<EOF
FROM busybox
ADD http://host.docker.internal:8080 /foo.html
EOF
DOCKER_BUILDKIT=1 docker build -<<EOF
FROM busybox
ADD http://host.docker.internal:8080 /foo.html
EOF
Which prints these in the logs of the nginx container:
172.17.0.1 - - [26/Apr/2021:15:56:11 +0000] "GET / HTTP/1.1" 200 612 "-" "Go-http-client/1.1" "-"
172.17.0.1 - - [26/Apr/2021:15:56:30 +0000] "GET / HTTP/1.1" 200 612 "-" "Go-http-client/1.1" "-"
I'm trying to write a PR for this feature.
It is my understanding that the ADD instruction is first parsed from a Dockerfile here:
The parsed data are stored in this struct:
Then, it's handled by moby via:
Coincidentally, I forked both moby/buildkit and moby/moby, since one parses the instruction and the other acts upon it. I made changes on each fork, but while I can test each project individually, it seems that I can't use my forked buildkit inside my forked moby.
Does anybody know how I could test my implementation? I thought of modifying the imports during the tests, but it's not a realistic solution.
I just tried on a build of Docker 27.3 (BuildKit 0.16), but looks like this is still the case, and it's still using the default Go-http-client/1.1
user-agent.
There are some related tickets to make these headers configurable;
But I think it would make sense to at least set some default that's not Go-http-client/1.1
, because I know there's some websites that block that user-agent.
For example Docker's own website doesn't allow;
curl -A 'Go-http-client/1.1' -sI https://www.docker.com/ | head -1
HTTP/2 403
curl -A 'buildkit/0.16' -sI https://www.docker.com/ | head -1
HTTP/2 200
So trying to download a file from the website using ADD
will fail;
echo -e 'FROM scratch\nADD https://www.docker.com/ /foo.html\n' | docker build -
[+] Building 0.4s (3/4) docker:default
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 89B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> ERROR [1/1] ADD https://www.docker.com/ /foo.html 0.3s
------
> [1/1] ADD https://www.docker.com/ /foo.html:
------
ERROR: failed to solve: failed to load cache key: invalid response status 403
Let me move this ticket to the
Let me move this to the BuildKit repository
I don't really see how buildkit specific user-agent is any better when doing generic HTTP requests that are not buildkit-specific. In registry requests, it makes sense as we are a known client for registries. In here we are just trying to work around someone's server configuration. If the server wants to block us, it's their choice. And if they want to block something else that makes requests with Go HTTP client, that other tool can trivially work around that blockage with a random/fake user-agent. For any server that needs to have some special behavior for Go HTTP client, sending the correct user-agent reflecting that this library is making the request seems the most correct option.
The default Go user agent is becoming more common to block; similar to some Java user agents being blocked by default https://community.cloudflare.com/t/cloudflare-blocks-java-10-user-agents-by-default/374648
Is there a reason we want to advertise buildkit (or the front end) to be a generic Go application?
Is there a reason we want to advertise buildkit (or the front end) to be a generic Go application?
There might be some privacy concerns but mainly it is just that this is a generic Go library request without anything buildkit specific in there. Go user-agent may be theoretically useful for a server if that client has some behavior specific to the implementation, but buildkit user-agent is meaningless to a random server answering to plain GET
request.
The default Go user agent is becoming more common to block; similar to some Java user agents being blocked by default
That is fine and we shouldn't try to outsmart them. Some websites want to target only human users via browsers and that is their choice.
That being said, I don't think this is the most important thing in BuildKit behavior. If you think it is important that a different user-agent should be used for this request, feel free to send a PR.
IMO, I don't think we should switch the default, but it should be configurable by LLB (I think we've discussed this elsewhere, with the ability to set arbitrary headers).
If there's a reason to switch the default, we should do so in a backwards compat way (using a capability, to avoid breakage, since even though unlikely, some applications/metrics gathering systems may be relying on the current behavior).
but it should be configurable by LLB (I think we've discussed this elsewhere, with the ability to set arbitrary headers).
Wonder if we could also have a buildkit conf for http client opts:
[source.http]
[source.http.headers]
"User-Agent" = "foo"
Also we have ProxyEnv that works with ExecOp but don't think this is extended to http source.
The configurable angle is nice but would take a bit more work. Let's start by changing the default with this issue and use a follow-up for the more complex/configurable route.
Description The ADD instruction uses the user agent
go-http-client/1.1
when the source is an URL. If for some reason this user agent is blacklisted, downloading a file using ADD becomes impossible.Context I was trying to bust a cached git repository, cloned from my company's own repositories, using ADD. Unfortunately, my company has a list of banned user agents, including
go-http-client/1.1
, that prevents me from downloading a file with this instruction. I am aware that several workarounds exist, hence this issue is not a priority, but for this use case, nothing is as simple as using ADD.Describe the results you received: The build fails with a message similar to
failed to load cache key: Get $URL: EOF
. Where$URL
is the one fed to thesrc
argument of the ADD instruction.Describe the results you expected: The file to be downloaded by the ADD instruction.
Possible solution: I believe that if there was an optional flag
--user-agent
, to set the user agent used by ADD, it would fix the issue. Since the flag would be optional,go-http-client/1.1
would still be the default user agent.Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.): Docker images are mainly built inside WSL2.