Open grzleadams opened 2 months ago
I should mention that we're seeing this when using docker/build-push-action
, and I'm not entirely convinced this is a Pulp issue, but wanted to open this in case it is.
In a previous job I remember seeing something like this in Artifactory, and it was related to the client sending a PUT vs. a PATCH. Not sure if this could be something similar.
For reference, here's our client information after setting buildx up with docker/setup-buildx-action
:
Client:
Version: 25.0.4
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.13.1
Path: /usr/local/lib/docker/cli-plugins/docker-buildx
I am not familiar that much with buildx, can you share if available all the calls it makes to pulp?
Usually when a layer is being uploaded, first an upload-id is being created with POST, that's the https://pulp.<domain>/v2/<image>-cache/blobs/uploads/01912e58-af0e-7f79-b5ca-56eb659b5f6e
Then chunks are being uploaded to it via PATCH.
PUT is a sign that blob upload completes with the upload of last chunk. After that a DELETE is issued to the upload-id.
This log info suggests that:
I assume that's the PUT for the image manifest (since this is caching to the registry) but I'll see if there's a way to enable buildx debugging in the workflow to get all those calls. Later runs of the same workflow succeeded with the push of this cache image, for what it's worth.
No no, the error is clear that call to the blob upload endpoint failed and not manifest endpoint. Yeah please get us the api call logs if possible.
Later runs of the same workflow succeeded with the push of this cache image, for what it's worth.
That's a sign of some race condition. But for what it's worth I see such bug report for the first time, have not observed such via podman or docker, so there is a chance there is something fishy with buildx?
Later runs of the same workflow succeeded with the push of this cache image, for what it's worth.
That's a sign of some race condition. But for what it's worth I see such bug report for the first time, have not observed such via podman or docker, so there is a chance there is something fishy with buildx?
I agree, it feels like some kind of race condition. buildx does implement some parallelization of builds but as far as I know/can tell the push of the layers is single-threaded, so I'm not exactly sure where such a race condition would come in. Either way, I'll see about exporting the buildx logs from the workflow so we can try to get more information.
Version
Describe the bug We occasionally (with no real discernible pattern) see
404 Not Found
during image pushes. Anecdotally, it seems like it happens most when we're pushing to the registry cache, but we've seen it during the image push steps too. For example:To Reproduce Unclear, since it doesn't seem to happen all the time or in any identifiable situations.
Expected behavior The push should succeed.
Additional context At first I thought it could be related to https://github.com/pulp/pulp_container/issues/1587 but I verified that
image-manifest=true
was on thecache-to:
line. It almost feels like the token expires and so the push fails, but I haven't found anything to indicate that in logs, and we have token expiration set to3600
(which is far longer than the workflow takes to run).