microsoft / azure-pipelines-tasks

Tasks for Azure Pipelines
https://aka.ms/tfbuild
MIT License
3.47k stars 2.6k forks source link

[BUG]: Docker Inspect throws exit code 125 due to not finding some layer IDs in final built image. #20189

Open webjoaoneto opened 1 month ago

webjoaoneto commented 1 month ago

New issue checklist

Task name

Docker@2

Task version

2.243.0

Issue Description

When docker push after update version to 2.243.0 raises this error on Docker push pipeline

/usr/bin/*** inspect -f {{.RootFS.Layers}}
Error: no names or ids specified
##[error]Error: no names or ids specified
##[error]Unhandled: The process '/usr/bin/***' failed with exit code 125
##[error]Error: The process '/usr/bin/***' failed with exit code 125
    at ExecState._setResult (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
    at ExecState.CheckComplete (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
    at ChildProcess.<anonymous> (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)

The task pushes the docker image to the right place, but pipeline crashes because the command docker inspect -f inspect -f {{.RootFS.Layers}} is not passing the image name as an next argument.

Fix: We back to the version 2.240.2

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

Azure DevOps Server (Please specify exact version in the textbox below)

Azure DevOps Server Version (if applicable)

No response

Operation system

ubuntu

Relevant log output

createdAt:2024-07-07T23:32:36Z; layerSize:9.54MB; createdBy:RUN /bin/sh -c set -eux;    apt-get update;     apt-get install -y --no-install-recommends      ca-certificates         netbase         tzdata  ;   rm -rf /var/lib/apt/lists/* # buildkit; layerId:<missing>
createdAt:2024-07-07T23:32:36Z; layerSize:0B; createdBy:ENV LANG=C.UTF-8; layerId:<missing>
createdAt:2024-07-07T23:32:36Z; layerSize:0B; createdBy:ENV PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin; layerId:<missing>
createdAt:2024-07-02T01:25:02Z; layerSize:0B; createdBy:/bin/sh -c #(nop)  CMD ["bash"]; layerId:<missing>
createdAt:2024-07-02T01:25:02Z; layerSize:77.8MB; createdBy:/bin/sh -c #(nop) ADD file:b24689567a7c604de93e4ef1dc87c372514f692556744da43925c575b4f80df6 in / ; layerId:<missing>
/usr/bin/*** inspect -f {{.RootFS.Layers}}
Error: no names or ids specified
##[error]Error: no names or ids specified
##[error]Unhandled: The process '/usr/bin/***' failed with exit code 125
##[error]Error: The process '/usr/bin/***' failed with exit code 125
    at ExecState._setResult (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
    at ExecState.CheckComplete (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
    at ChildProcess.<anonymous> (/opt/app-root/app/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)
Finishing: Docker Push

Full task logs with system.debug enabled

 [REPLACE THIS WITH YOUR INFORMATION] 

Repro steps

No response

flinox-testes commented 1 month ago

Exactly same problem here... the version 2.240.2 it works.

dcs-adam commented 1 month ago

Same issue here. Error was there on 2.240.2, but allowed the task to complete successfully. On 2.243.0, the image is pushed to the registry, but the pipeline fails.

MarkKharitonov commented 1 month ago

Same issue here as well. Does anyone know if it is possible to invoke the previous version of the task?

MarkKharitonov commented 1 month ago

Fix: We back to the version 2.240.2

@webjoaoneto - how did you go back to 2.240.2 ?

chrislanzara commented 1 month ago

+1 on this. It's now halted our build pipelines for our projects.

We were on version 2.240.2 up to this afternoon (around midday - 12pm - UK time on the 24th July), then we seem to have gone to 2.243.0 and the stage fails now, but the image is pushed to the ACR. No changes to our docker config, azure pipeline config etc, this seems to be handled by the Docker@2 stage alone.

          - task: Docker@2
            displayName: Push Container to ACR
            continueOnError: false
            inputs:
              command: push
              repository: $(imageName)
              tags: $(tag)
              containerRegistry: dockerRegistryServiceConnection

I did see something like this a while ago (few months back), a very similar issue on the push command, but by the time I went back to it the issue had resolved itself and the stage was successful.

==============================================================================
Task         : Docker
Description  : Build or push Docker images, login or logout, start or stop containers, or run a Docker command
Version      : 2.243.0
Author       : Microsoft Corporation
Help         : https://aka.ms/azpipes-docker-tsg

...

/usr/bin/docker inspect -f {{.RootFS.Layers}}
"docker inspect" requires at least 1 argument.
See 'docker inspect --help'.

Usage:  docker inspect [OPTIONS] NAME|ID [NAME|ID...]

Return low-level information on Docker objects
##[error]"docker inspect" requires at least 1 argument.
##[error]See 'docker inspect --help'.
##[error]Usage:  docker inspect [OPTIONS] NAME|ID [NAME|ID...]
##[error]Return low-level information on Docker objects
##[error]Unhandled: The process '/usr/bin/docker' failed with exit code 1
##[error]Error: The process '/usr/bin/docker' failed with exit code 1
    at ExecState._setResult (/home/vsts/work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
    at ExecState.CheckComplete (/home/vsts/work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
    at ChildProcess.<anonymous> (/home/vsts/work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
    at ChildProcess.emit (node:events:513:28)
    at maybeClose (node:internal/child_process:1100:16)
    at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)
Finishing: Push Container to ACR

I think the issues around docker inspect are a bit misleading for this directly because looking at our runs from earlier today when 2.240.2 was used, we got the same errors about the docker inspect command but the stage was allowed to complete, whereas under 2.243.0 the same docker inspect errors appear but then the Unhandled message appears and this then causes the stage to fail.

This is from a run where the stage completed successfully:

==============================================================================
Task         : Docker
Description  : Build or push Docker images, login or logout, start or stop containers, or run a Docker command
Version      : 2.240.2
Author       : Microsoft Corporation
Help         : https://aka.ms/azpipes-docker-tsg

...

/usr/bin/docker inspect -f {{.RootFS.Layers}}
"docker inspect" requires at least 1 argument.
See 'docker inspect --help'.

Usage:  docker inspect [OPTIONS] NAME|ID [NAME|ID...]

Return low-level information on Docker objects
##[error]"docker inspect" requires at least 1 argument.
##[error]See 'docker inspect --help'.
##[error]Usage:  docker inspect [OPTIONS] NAME|ID [NAME|ID...]
##[error]Return low-level information on Docker objects
Finishing: Push Container to ACR

Oddly this is only failing for a pipeline building an Angular image. We run 2.243.0 when building a C# API and that runs fine (the underscores are my own to shorten the lines):

Starting: ACR Push
==============================================================================
Task         : Docker
Description  : Build or push Docker images, login or logout, start or stop containers, or run a Docker command
Version      : 2.243.0
Author       : Microsoft Corporation
Help         : https://aka.ms/azpipes-docker-tsg

...

createdAt:2024-07-23T05:24:***5Z; layerSize:74.8MB; createdBy:/bin/sh -c #(nop) ADD file:6c4730e7b**___ebfb56a602 in / ; layerId:<missing>
/usr/bin/docker inspect 02de***9c***4fb___7d6bef3acb5b7*** -f {{.RootFS.Layers}}
[sha256:e078***___4***e***f sha256:2ea3b5___***2f sha256:ad8af89334___07fc206***9fc sha256:855e5***907d3ec93___78a***324c sha256:58fa834ef___9657363ae sha256:698640980e___35b248c5e sha256:5e***ee***___5f***50***20359f sha256:a2980f6c44a___98722208da sha256:e45***75fc8___d44d0a8a7a6206 sha256:f04d0d2___dfe867df***]
Finishing: ACR Push

Anyone able to help or is there any way to force the stage to use the previous 2.240.2 version?

Thanks.

MarkKharitonov commented 1 month ago

OK, found it. It is actually straightforward to use the older version, just use Docker@2.240.2

lucasrcorreia commented 1 month ago

I had the same problem here and the solution was to force the previous minor version in the yaml, simply by changing the code from:

- task: Docker@2

to:

- task: Docker@2.240.2

chrislanzara commented 1 month ago

OK, found it.

It is actually straightforward to use the older version, just use Docker@2.240.2

This worked! Thank you so much. One to remember for the future too...

v-schhabra commented 1 month ago

Hi @lucasrcorreia @webjoaoneto @MarkKharitonov @chrislanzara Could you please share the complete debug logs of the failed pipeline by enabling system.debug to true?

YodaDaCoda commented 1 month ago

@v-schhabra i'm encountering this same issue. I've attached a log from a build today that shows the error with System.Debug set to true per docs.

I've reverted to Docker@2.240.2 for now and I can confirm this allows the build to pass (though the error messages RE docker inspect are still present).

docker.log

Bodewes commented 1 month ago

I've the same problem. Pipelines that ran fine a week ago are now broken.

Changing the docker@2 task from

- task: Docker@2

to

- task: Docker@2.240.2

fixed it for now.

With command 'buildAndPush' the images are still pushed but the task fails with an error:

[error]Error: The process '/usr/bin/docker' failed with exit code 125
at ExecState._setResult (/azp/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1249:25)
at ExecState.CheckComplete (/azp/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1232:18)
at ChildProcess.<anonymous> (/azp/_work/_tasks/Docker_e28912f1-0114-4464-802a-a3a35437fd16/2.243.0/node_modules/azure-pipelines-task-lib/toolrunner.js:1160:19)
at ChildProcess.emit (node:events:513:28)
at maybeClose (node:internal/child_process:1100:16)
at Process.ChildProcess._handle.onexit (node:internal/child_process:304:5)
chrislanzara commented 1 month ago

Hi @v-schhabra

A debug log for the Push to ACR stage is attached. I've had to remove the names from the log so you'll see REDACTED in place of our project/application name.

push to acr failing log.txt

Run using the "Enable system diagnostics" checkbox on a pipeline run.

HTH

v-schhabra commented 1 month ago

Hi @chrislanzara Thanks for sharing the logs. We are investigating on this issue and will try to fix it soon.

philipp-durrer-jarowa commented 1 month ago

Same setup here (Azure DevOps agents on k8s using KEDA scaled jobs with podman) and same issue appearing since a few hours.

Dom-Heal commented 1 month ago

We have the same issue with agents on VMSS with container jobs using docker

v-schhabra commented 1 month ago

Hi @chrislanzara @Dom-Heal @philipp-durrer-jarowa @Bodewes Could someone please let us know why are we using podman? What we are trying to do using podman? And can we use docker instead of podman and check if still the error occurs?

chrislanzara commented 1 month ago

Hi @v-schhabra,

Frankly, I'm not sure we are, or aware that we are.

Our pipeline references the Docker@2 stage only. I've included the build and dev deployment stage from our yaml file so you can see what we actually reference:


pool:
  vmImage: ubuntu-latest

stages:
  - stage: Build
    displayName: Build Dev
    variables:
      - group: ui-REDACTED-app
    jobs:
      - job: BuildContainer
        displayName: Build Container
        steps:
          - task: npmAuthenticate@0
            displayName: NPM Authentication
            inputs:
              workingFile: .npmrc
          - task: Docker@2
            displayName: Build Container
            continueOnError: false
            inputs:
              command: build
              Dockerfile: Dockerfile
              buildContext: .
              tags: $(tag)
              repository: $(imageName)
              containerRegistry: dockerRegistryServiceConnection
              arguments: '--build-arg BASEHREF=/ui/REDACTED/ --build-arg ENVIRONMENT=dev --build-arg KENDO_UI_LICENSE="$(KENDO_UI_LICENSE)" --build-arg NODE_OPTIONS=--max_old_space_size=16384'
          # Replace the line below with Docker@2.240.2, Docker@2 fails
          - task: Docker@2 
            displayName: Push Container to ACR
            continueOnError: false
            inputs:
              command: push
              repository: $(imageName)
              tags: $(tag)
              containerRegistry: dockerRegistryServiceConnection
      - job: StoreManifests
        displayName: Store K8s Manifests
        steps:
          - publish: k8s
            artifact: k8s
  - stage: DeployDev
    displayName: Deploy Dev
    condition: and(succeeded(), eq(variables.isDev, true))
    dependsOn: Build
    jobs:
      - deployment: DeployApp
        displayName: Deploy
        environment: dev.ui-dev
        strategy:
          runOnce:
            deploy:
              steps:
                - task: KubernetesManifest@0
                  displayName: AKS Create Registry Secret
                  inputs:
                    action: createSecret
                    secretType: dockerRegistry
                    secretName: REDACTED
                    dockerRegistryEndpoint: dockerRegistryServiceConnection

                - task: KubernetesManifest@0
                  displayName: Deploy
                  inputs:
                    action: deploy
                    manifests: $(Pipeline.Workspace)/k8s/dev-deployment.yml
                    imagePullSecrets: |
                      REDACTED
                    containers: |
                      REDACTED.azurecr.io/$(imageName):$(tag)

We've had Docker@2 in our pipeline for quite a while now, and use it on C# APIs as well as Angular UX projects. I see that using the Docker@2 stages is still on the Microsoft docs website, for example

Our deployment target is an AKS instance running Kubernetes version 1.29.4: image

We would have referenced the Microsoft Docs or the classic pipeline builder UI in DevOps when we originally set the pipelines up several years ago.

So unless Azure is doing something "under the covers", I'm not consciously aware that we are using podman, if we actually are.

If there is another way of doing it you want us to explore, I'm happy to help test, but can you offer any more specific instructions on what alternative you wish us to test please?

Thanks!

Dom-Heal commented 1 month ago

Hi @chrislanzara @Dom-Heal @philipp-durrer-jarowa @Bodewes Could someone please let us know why are we using podman? What we are trying to do using podman? And can we use docker instead of podman and check if still the error occurs?

Hi @v-schhabra - We are not using podman, we are only using docker and this problem exists. Our agents run on Azure VMSS and the jobs run within docker containers. The docker push step is running inside the "docker" container job.

https://learn.microsoft.com/en-us/azure/devops/pipelines/process/container-phases?view=azure-devops https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema/jobs-job-container?view=azure-pipelines

Hope this helps

v-schhabra commented 1 month ago

Hi, We have started our investigation on this issue. Previously, the task-lib didn't catch promise rejections earlier but this was fixed in version 4.0.2. The issue is there in the Docker task itself, which isn't working correctly when layerId is missing. The task is expecting layerid to proceed but layerid is missing in the logs. Docker's developers says that it's expected behavior, so we probably need to change task's logic to handle cases when it's missing.

sergeykrulikovskiy commented 1 week ago

Hello,

Are there any updates on this issue?