Open liggitt opened 4 months ago
/kind bug
/triage accepted /sig architecture testing /area code-organization
go: download go1.22.3 for linux/amd64: toolchain not available
This is an error I've seen when we use go 1.x
and not go 1.x.y
as the go directive, some instances: https://github.com/golang/go/issues/62278, https://github.com/dependabot/dependabot-core/issues/7895
Which is why what you said here makes sense: https://github.com/kubernetes/kubernetes/pull/124922#issuecomment-2117868143. Once the actual bump happens, GOTOOLCHAIN
env variable that will propagate into the container will match with the toolchain that exists already and go
will not try and download it again leading to the error (similar to the issues linked).
This seems to also be the reason the verify job failed here: https://github.com/kubernetes/kubernetes/pull/122410
When I had started the testing, master
was still not on go1.22 leading to the kube-build go toolchain mismatch.
I wonder if it would be a good idea to substitute the value of .go-version
directly in the kube-build script
But I guess the long term solution would be to incorporate the toolchain directive into our go mods and the tooling.
But I guess the long term solution would be to incorporate the toolchain directive into our go mods and the tooling.
This also has the benefit of enabling IDEs to respect it without our makefiles (though your builds still won't match us if you just go build), but we gain additional complexity making these align and we need to continue to respect .go-version / envs properly for now because downstream users depend on this "API"
Even if we switch to toolchain in go.mod we should probably make the env / .go-version work properly for at least one minor while putting a release note that downstreams must switch. (I'm sure an argument could be made for longer).
It would probably be preferable to keep them working permanently because nobody wants to carry go.mod patches but it is common to need to force a different go version for security patching etc.
Do we know yet if this applies to all branches?
Do we know yet if this applies to all branches?
yes, it applies to all branches (we saw this when trying to just bump .go-version to 1.22.x on release branches)
/assign
So, interestingly if I do GOTOOLCHAIN=go1.21.9 make verify WHAT=codegen
, we get the same failure.
go: downloading go1.21.9 (linux/amd64) go: download go1.21.9 for linux/amd64: toolchain not available
Seems like something is broken with GOTOOLCHAIN support in this go install or go, rather than our scripts?
Maybe something with the kube-build image.
Normal go image:
$ docker run --rm -e GOTOOLCHAIN=go1.22.3 golang:1.21.10-bullseye go version
Unable to find image 'golang:1.21.10-bullseye' locally
1.21.10-bullseye: Pulling from library/golang
3d53ef4019fc: Pull complete
08f0bf643eb6: Pull complete
6b037c2b46ab: Pull complete
c9c00c8cd734: Pull complete
a4de642a3616: Pull complete
785d32889ff9: Pull complete
4f4fb700ef54: Pull complete
Digest: sha256:6815d296f20c3cb3afa67bf5dc862a831b6080f5dd8b5b423bb5f862539299de
Status: Downloaded newer image for golang:1.21.10-bullseye
go: downloading go1.22.3 (linux/amd64)
go version go1.22.3 linux/amd64
kube-cross:
docker run --rm -e GOTOOLCHAIN=go1.22.3 registry.k8s.io/build-image/kube-cross:v1.28.0-go1.21.10-bullseye.0 go version
Unable to find image 'registry.k8s.io/build-image/kube-cross:v1.28.0-go1.21.10-bullseye.0' locally
v1.28.0-go1.21.10-bullseye.0: Pulling from build-image/kube-cross
646e886fa3cf: Pull complete
c5a360c5f105: Pull complete
5cbeb8ef1d90: Pull complete
f3054eaa1068: Pull complete
fd19ff5befab: Pull complete
029d7cfe87db: Pull complete
4f4fb700ef54: Pull complete
ab038281f51a: Pull complete
9bc87ef1bf76: Pull complete
eb4d85f62c54: Pull complete
00a4d6140bd2: Pull complete
30d8f173cf20: Pull complete
1903caad7a82: Pull complete
e7f153876917: Pull complete
Digest: sha256:1996e852695eb1d3b48bda278a4584317d5d42eaef226899a331ea37394b6952
Status: Downloaded newer image for registry.k8s.io/build-image/kube-cross:v1.28.0-go1.21.10-bullseye.0
go: downloading go1.22.3 (linux/amd64)
go version go1.22.3 linux/amd64
kube-build:
$ docker run --rm -e GOTOOLCHAIN=go1.22.3 kube-build:build-9a4d0b6c27-5-v1.28.0-go1.21.10-bullseye.0 go version
go: downloading go1.22.3 (linux/amd64)
go version go1.22.3 linux/amd64
but under verify-codegen with GOTOOLCHAIN=go1.22.3 make verify WHAT=codegen
:
...
+++ [0520 21:58:12] Building Docker image kube-build:build-17572b7f79-5-v1.28.0-go1.21.10-bullseye.0
+++ [0520 21:58:22] Creating data container kube-build-data-17572b7f79-5-v1.28.0-go1.21.10-bullseye.0
+++ [0520 21:58:23] Syncing sources to container
+++ [0520 21:58:31] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 21:58:31] Running build command...
go: downloading go1.22.3 (linux/amd64)
go: download go1.22.3 for linux/amd64: toolchain not available
...
.... hmm
narrowed it down to hack/lib/init.sh
:
$ GOTOOLCHAIN=go1.22.3 build/run.sh bash -c 'source hack/lib/init.sh && go version && echo $GOTOOLCHAIN'
+++ [0520 22:09:21] Verifying Prerequisites....
+++ [0520 22:09:22] Building Docker image kube-build:build-b9c550bd56-5-v1.28.0-go1.21.10-bullseye.0
+++ [0520 22:09:31] Syncing sources to container
+++ [0520 22:09:33] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 22:09:33] Running build command...
go version go1.21.10 linux/amd64
go1.22.3
+++ [0520 22:09:34] Syncing out of container
$ GOTOOLCHAIN=go1.22.3 build/run.sh bash -c 'go version && echo $GOTOOLCHAIN'
+++ [0520 22:09:54] Verifying Prerequisites....
+++ [0520 22:09:54] Building Docker image kube-build:build-b9c550bd56-5-v1.28.0-go1.21.10-bullseye.0
+++ [0520 22:10:04] Syncing sources to container
+++ [0520 22:10:06] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 22:10:06] Running build command...
go: downloading go1.22.3 (linux/amd64)
go version go1.22.3 linux/amd64
go1.22.3
+++ [0520 22:10:13] Syncing out of container
Talked to @liggitt about this, who found GO111MODULE=off
suspect, setting this will seemingly cause go version
to ignore GOTOOLCHAIN
.
even @ 765e7ef0d21 (just merged to master an hour ago) we can see:
$ GOTOOLCHAIN=go1.22.1 build/run.sh bash -c 'GO111MODULE=off go version && echo "GOTOOLCHAIN=$GOTOOLCHAIN"'
+++ [0520 23:39:30] Verifying Prerequisites....
+++ [0520 23:39:31] Building Docker image kube-build:build-3ad27617b0-5-v1.31.0-go1.22.3-bullseye.0
+++ [0520 23:41:10] Deleting image kube-build:build-3ad27617b0-5-v1.30.0-go1.22.2-bullseye.0
+++ [0520 23:41:12] Creating data container kube-build-data-3ad27617b0-5-v1.31.0-go1.22.3-bullseye.0
+++ [0520 23:41:23] Syncing sources to container
+++ [0520 23:41:36] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 23:41:36] Running build command...
go version go1.22.3 linux/amd64
GOTOOLCHAIN=go1.22.1
+++ [0520 23:41:37] Syncing out of container
$ GOTOOLCHAIN=go1.22.1 build/run.sh bash -c 'GO111MODULE=on go version && echo "GOTOOLCHAIN=$GOTOOLCHAIN"'
+++ [0520 23:41:54] Verifying Prerequisites....
+++ [0520 23:41:55] Building Docker image kube-build:build-3ad27617b0-5-v1.31.0-go1.22.3-bullseye.0
+++ [0520 23:42:04] Syncing sources to container
+++ [0520 23:42:06] Output from this container will be rsynced out upon completion. Set KUBE_RUN_COPY_OUTPUT=n to disable.
+++ [0520 23:42:06] Running build command...
go: downloading go1.22.1 (linux/amd64)
go version go1.22.1 linux/amd64
GOTOOLCHAIN=go1.22.1
+++ [0520 23:42:14] Syncing out of container
So this is probably related.
On master GOTOOLCHAIN=go1.22.1 make verify WHAT=codege
is not broken, presumably because we don't set GO111MODULE=off
anymore after @thockin's WORKSPACES work.
I think we can say this is already fixed @ HEAD and is just poorly behaved on the older branches.
We should probably hack in a gimme fallback for this case on older branches while doing our best to respect GOTOOLCHAIN
after introducing it (IIRC we put a release note about using GOTOOLCHAIN
to control go versions). We already use gimme if go is not available or too old.
Might not be worth prioritizing a patch given this is only relevant if you're:
This should be fine in 1.30+ *
*well, there's a smaller bug in https://github.com/kubernetes/kubernetes/issues/125010
Thanks for digging here @BenTheElder!
Hmm, this is slightly confusing, on darwin:
❯ go version
go version go1.22.2 darwin/amd64
and with GO111MODULE=off
❯ GOTOOLCHAIN=go1.21.9 GO111MODULE=off go version
go version go1.21.0 darwin/amd64
I'm not sure why its go1.21.0
, I don't have that toolchain present locally either...
with GO111MODULE=on
:
❯ GOTOOLCHAIN=go1.21.9 GO111MODULE=on go version
go version go1.21.9 darwin/amd64
Ah,
I see where the effect of GO111MODULE
on GOTOOLCHAIN
comes from: https://github.com/golang/go/blob/dddf0ae40fa0c1223aba191d73a44425a08e1035/src/cmd/go/internal/toolchain/select.go#L93
Right, so for the newer branches where we have modules on and workspaces adopted, that just leaves https://github.com/kubernetes/kubernetes/issues/125010
We could try to mitigate this in the older branches ... but it's a fairly niche issue and the workspaces change is too large to backport so we'll have to write something custom for the old branches ... and you can already work around it by just overriding the build image to have the right go versoin.
https://github.com/kubernetes/kubernetes/issues/125010 was fixed, so this should be OK for future branches.
I want to iterate on the fix for https://github.com/kubernetes/kubernetes/issues/125010 and then backport to 1.30.
We still have 1.28 / 1.29 for a while and should keep tracking this otherwise for now, but it's not clear that we have sufficient demand to invest in this further versus the involved workspace changes in 1.30+
Seen in https://github.com/kubernetes/kubernetes/pull/124922
Something is not working properly between .go-version → GOTOOLCHAIN when the go version inside this specific container doesn't match
/assign /cc @BenTheElder @MadhavJivrajani