microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.47k stars 2.6k forks source link

Docker Inspect throws exit code 125 due to not finding some layer IDs in final built image. #20238

Closed takis-kapas closed 1 month ago

takis-kapas commented 1 month ago

New issue checklist

Task name

Docker@2

Task version

2.243.0

Issue Description

Using Podman v4.9.3 with container image Ubuntu:24.04. ADO agent is running inside the container.

The image is build successfully with the Docker@2 task using Podman. The issue is that at the end of the build, the Docker@2 Task is running Docker Inspect to inspect the built image.

This is where Docker Inspect is throwing an error because not all the layers of the built image have an ID or Name. This is a known strategy by docker where it does not write layers ID for ALL the image layers in history.

This issue is also raised in the official Podman repo ISSUE#21198.

Is there any change that has been applied to the Docker@2 Task, that gets affected by the image layer IDs in history?

Is there a workaround on this, like forcing the Docker@2 Task to NOT run Docker Inspect at the end of the build?

Or finally, can someone please review this and maybe apply a fix since it make the Docker@2 Task use with Podman simply unusable since it breaks the Pipeline?

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Ubuntu:24.04

Relevant log output

Skip to main content
Azure DevOps

Account manager for Kapasakalidis, Panagiotis

Jobs in run #ephemeral_agents_test_manual_podman_1252
Build and Push to OpenShift

View raw log

##[debug]Processed: ##vso[task.issue type=error;source=TaskInternal;correlationId=6f38dde4-97d1-495d-a1e1-226e2fcd0faa;]Error: no names or ids specified
##[debug]task result: Failed
##[error]Unhandled: The process '/usr/bin/docker' failed with exit code 125
##[debug]Processed: ##vso[task.issue type=error;source=TaskInternal;correlationId=6f38dde4-97d1-495d-a1e1-226e2fcd0faa;]Unhandled: The process '/usr/bin/docker' failed with exit code 125
##[debug]Processed: ##vso[task.complete result=Failed;]Unhandled: The process '/usr/bin/docker' failed with exit code 125
##[error]Error: The process '/usr/bin/docker' failed with exit code 125
    at ExecState._setResult (/azp/agent/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
    at ExecState.CheckComplete (/azp/agent/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
    at ChildProcess.<anonymous> (/azp/agent/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)
    at Process.callbackTrampoline (node:internal/async_hooks:130:17)
##[debug]Processed: ##vso[task.issue type=error;source=TaskInternal;correlationId=6f38dde4-97d1-495d-a1e1-226e2fcd0faa;]Error: The process '/usr/bin/docker' failed with exit code 125
    at ExecState._setResult (/azp/agent/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
    at ExecState.CheckComplete (/azp/agent/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
    at ChildProcess.<anonymous> (/azp/agent/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)
    at Process.callbackTrampoline (node:internal/async_hooks:130:17)

Full task logs with system.debug enabled

Attached Full System Logs Podman_Pipeline_Docker_Task_Issue_logs_1252.zip

Repro steps

Attached Podman Pipeline Job YAML Podman_ADO_Pipeline_YAML.txt

johnwc commented 1 month ago

Other issue related to this one I believe.

19432

takis-kapas commented 1 month ago

@johnwc thanks for bringing this to my attention.

I read through this issue and I think that this problem might have been fixed in Podman.

The error I am facing is that some Podman/docker mid-layers that are created with the image build do not carry an ID or Name, as the Task requires. From the documentation of layering in Docker, it seems that this is intentional.

So the problem is that when Docker Inspect runs in the Docker@2 Task after the image is built, the Task is throwing errors because it does not find an ID in some of these mid-layers.

johnwc commented 1 month ago

@kal1mera I tried updating to podman v5.1.2 today, it gives the same error.

takis-kapas commented 1 month ago

@johnwc I am not sure if the issue is with the Task itself or with Podman.

I used to run the Docker Task in an agent running on an ubuntu:22.04 container with podman v3.4.4 and that version of Podman used to throw a History error with a CreatedAt field, which from what I have read was a known issue with that version of Podman and it was claimed to be fixed in version after 4.x.x.

That is why I tried to use the Ubuntu:24.04 image which installs Podman v4.9.3. In this version it seems that the History CreatedAt is not throwing an 125 error in the Docker@2 Task, but instead there is the error with the image layers.

This is frustrating, and on the Podman GitHub repo there are other users that have the same issue.

I am not sure where the issue lies, but these users claim that ADO Pipelines start failing like a day or so ago.

Also, this issues coincide with the latest repo announcement from MS, that Ubuntu:24.04 is ready to be used as hosted runner pool both in ADO Agents and GitHub Runners, so I wonder if they changed anything in the Task as part of the release.

This is pretty frustrating though, as for me Podman is the only option to build containers inside my container ADO Agent.

johnwc commented 1 month ago

It's with the task and podman, it is calling inspect with an ID that podman does not recognize or something of the sorts from what I read in podman GitHub issue.

I do see that a few of the files where update just recently within the DockerV2 task folder, so I think it was a recent update of the task that is causing this.

I updated my build pipeline to just use the docker task to login, and then a bash script to run the docker commands. Try setting your pipeline from Docker@2 to Docker@2.240.3 in the yaml, that was the previous version of the task before the recent updates.

takis-kapas commented 1 month ago

@johnwc good idea downgrading the Task. Will try to do that.

I will also try to run podman inspect -f {{.RootFS.Layers}} with a Bash Task to check what the output will be. This is the same command that the Docker@2 Task runs on the image.

Last, I will try to run docker inspect on my local Docker Desktop to identify any changes and report back to the Podman repo issue and this issue.

If the Task's code is the problem, I am not that optimistic that Microsoft will try to add a fix, considering that Podman is not officially supported by this Task. Nevertheless, since MS always claims that they embrace open source, maybe they will take a leap into fixing this, who knows.

Podman is an important app (at least for my company), more secure than docker, and used by many users and companies (like Redhat OpenShift - they are building their container images with Podman).

If all else fails, then I will just use Podman build in a Bash Task and instruct my users to do the same when they use the Agents. But this will negate the use of the useful combo Build&Push option that the Docker@2 Task offers, but what can we do...

Anyways, I will be posting updates with my findings soon.

Thanks for looking and testing this as well.

johnnau commented 1 month ago

@johnwc I had to use Docker@2.240.3, but my builds are now back to functional with this change only.

takis-kapas commented 1 month ago

@johnwc This is great!!!

I however switched to the Bash ADO Task and I am building and pushing from the command line until this is resolved for good.

I will also try to open an enterprise support ticket with Microsoft, based on our findings. Maybe they will be able to review and fix it for future sprints.

P.S. Didn't have time today to test the Podman Inspect output, but I will try later this week.

takis-kapas commented 1 month ago

This is an FYI for Microsoft. I just edited the Issue's Initial Comment and added the ADO Pipeline Full Logs and the Podman Pipeline YAML Job Definition which was failing.

johnnau commented 1 month ago

I have gone back through and looked over my build logs since I switched to podman last year (building on rhel with podman and the podman-docker package), there have been errors the entire time showing "##[error]Error: no names or ids specified", but this was not stopping the build. Unsure if this is unique to my environment or related, but perhaps it is useful.

johnwc commented 1 month ago

I have gone back through and looked over my build logs since I switched to podman last year (building on rhel with podman and the podman-docker package), there have been errors the entire time showing "##[error]Error: no names or ids specified", but this was not stopping the build. Unsure if this is unique to my environment or related, but perhaps it is useful.

@johnnau I noticed the same behavior prior as well, I believe the updated referenced library in the update changed to start throwing it as an error instead of ignoring it. The thing is, I don't understand why it is calling inspect after a successful push. (All we use the docker task for is to push images.)

takis-kapas commented 1 month ago

@johnnau I noticed the same behavior prior as well, I believe the updated referenced library in the update changed to start throwing it as an error instead of ignoring it. The thing is, I don't understand why it is calling inspect after a successful push. (All we use the docker task for is to push images.)

Yes, this behavior (error in the Task but not terminating error in the Pipeline) was happening to me when I was building on podman v3.4.4 with ubuntu:22.04 image initially. But after the Docker@2 Task update it was failing the Pipeline with the 125 error.

But as I said earlier the error with podman v3.4.4 was on the Docker History call of the Task which was not finding a "CreatedAt" field that was looking for.

v-schhabra commented 1 month ago

Hi, We have started our investigation on this issue. Previously, the task-lib didn't catch promise rejections earlier but this was fixed in version 4.0.2.

The issue is there in the Docker task itself, which isn't working correctly when layerId is missing. The task is expecting layerid to proceed but layerid is missing in the logs. Docker's developers says that it's expected behavior, so we probably need to change task's logic when it's missing.

v-schhabra commented 1 month ago

duplicate #20189

johnwc commented 1 month ago

@v-schhabra what is the purpose of calling inspect after a successful push, when only pushing to a repo?