microsoft / azure-pipelines-agent

Azure Pipelines Agent 🚀
MIT License
1.72k stars 864 forks source link

[BUG]: Try to append batches web console lines for record #4922

Open ovaar opened 1 month ago

ovaar commented 1 month ago

What happened?

While executing long-running python tasks tasks using PythonScript@0 or just bash, which invokes python, the agent seems to lose connection to the host, maybe because it didn't receive enough messages in the stdout/stderr to feed the watchdog, resulting in a broken connection that makes the build freeze?

    - task: PythonScript@0
          displayName: "Build"
          inputs:
            scriptSource: 'filePath'
            scriptPath: "$(Pipeline.Workspace)build.py"
            arguments: '--build_type=Release --rebuild --deploy'
            workingDirectory: "$(Pipeline.Workspace)"
            pythonInterpreter: 'python3'

    - bash: |
          set -euo pipefail
          python3 build.py --build_type=Release --rebuild --deploy
        displayName: "Build"

Running the (long running) build scripts as shown above, results in Python using buffered streams for stdout and stderr. Setting the PYTHONUNBUFFERED resolved the issue.

echo "##vso[task.setvariable variable=PYTHONUNBUFFERED]1"

Versions

Azure Devops version 3.236.1 / Windows 10

Environment type (Please select at least one enviroment where you face this issue)

Azure DevOps Server type

dev.azure.com (formerly visualstudio.com)

Azure DevOps Server Version (if applicable)

No response

Operation system

Windows 10

Version controll system

No response

Relevant log output

[2024-07-02 12:34:00Z INFO JobServerQueue] Try to append 4 batches web console lines for record 'f77843cc-c3dc-5c4f-e956-bc0deeafe94e', success rate: 4/4.
[2024-07-02 12:34:01Z INFO HostContext] Setting up value for origin: Variables_Set_repoUrlWithCred
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Exited process 9932 with exit code 0
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Finished process 9932 with exit code 0, and elapsed time 00:00:02.2383436.
[2024-07-02 12:34:01Z INFO WorkerCommandManager] Disable plugin internal command extension.
[2024-07-02 12:34:01Z INFO StepsRunner] Step result: 
[2024-07-02 12:34:01Z INFO StepsRunner] No need for updating job result with current step result 'Succeeded'.
[2024-07-02 12:34:01Z INFO StepsRunner] Current state: job state = ''
[2024-07-02 12:34:01Z INFO StepsRunner] Processing step: DisplayName='Run Unit Tests', ContinueOnError=False, Enabled=True
[2024-07-02 12:34:01Z INFO ExpressionManager] Evaluating: SucceededNode()
[2024-07-02 12:34:01Z INFO ExpressionManager] Result: True
[2024-07-02 12:34:01Z INFO StepsRunner] Starting the step.
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Work': 'E:\agent_pool\_work'
[2024-07-02 12:34:01Z INFO StepsRunner] Which: 'chcp'
[2024-07-02 12:34:01Z INFO StepsRunner] Location: 'C:\Windows\system32\chcp.COM'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Starting process:
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   File name: 'C:\Windows\system32\chcp.COM'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Arguments: '65001'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Working directory: 'E:\agent_pool\_work'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Require exit code zero: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Encoding web name:  ; code page: ''
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Persist current code page: 'True'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   ContinueAfterCancelProcessTreeKillAttempt: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] OOM score adjustment is Linux-only.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Process started with process id 6304, waiting for process exit.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] STDOUT/STDERR stream read finished.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Exited process 6304 with exit code 0
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Finished process 6304 with exit code 0, and elapsed time 00:00:00.0075292.
[2024-07-02 12:34:01Z INFO StepsRunner] Successfully returned to code page 65001 (UTF8)
[2024-07-02 12:34:01Z INFO TaskRunner] Allow publishing telemetry for PythonScript@0.237.1 task: True
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Work': 'E:\agent_pool\_work'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Tasks': 'E:\agent_pool\_work\_tasks'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Work': 'E:\agent_pool\_work'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'TaskZips': 'E:\agent_pool\_work\_taskzips'
[2024-07-02 12:34:01Z INFO TaskManager] Loading task definition 'E:\agent_pool\_work\_tasks\PythonScript_6392f95f-7e76-4a18-b3c7-7f078d2f7700\0.237.1\task.json'.
[2024-07-02 12:34:01Z INFO TaskRunner] Get handler data for target platform Windows
[2024-07-02 12:34:01Z INFO TaskRunner] Handler data is of type Microsoft.VisualStudio.Services.Agent.Worker.Node20_1HandlerData
[2024-07-02 12:34:01Z INFO TaskRunner] The original input is a rooted path, return absolute path: E:\agent_pool\_work\7\bimcollab-zoom\Build\test.py
[2024-07-02 12:34:01Z INFO TaskRunner] The original input is a rooted path, return absolute path: E:\agent_pool\_work\7\bimcollab-zoom\Build\
[2024-07-02 12:34:01Z INFO NodeHandler] Handler.AfterExecutionContextInitialized _continueAfterCancelProcessTreeKillAttempt = True
[2024-07-02 12:34:01Z INFO NodeHandler] AZP_AGENT_IGNORE_VSTSTASKLIB enabled, ignoring fix
[2024-07-02 12:34:01Z INFO NodeHandler] Task.json has node20_1 handler data: True node20ResultsInGlibCError = False
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Externals': 'E:\agent_pool\externals'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Bin': 'E:\agent_pool\bin'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Root': 'E:\agent_pool'
[2024-07-02 12:34:01Z INFO HostContext] Well known directory 'Externals': 'E:\agent_pool\externals'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Starting process:
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   File name: 'E:\agent_pool\externals\node20_1\bin\node.exe'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Arguments: '"E:\agent_pool\_work\_tasks\PythonScript_6392f95f-7e76-4a18-b3c7-7f078d2f7700\0.237.1\main.js"'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Working directory: 'E:\agent_pool\_work\7\s'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Require exit code zero: 'True'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Encoding web name: utf-8 ; code page: '65001'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Force kill process on cancellation: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Redirected STDIN: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Persist current code page: 'True'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   Keep redirected STDIN open: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   High priority process: 'False'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper]   ContinueAfterCancelProcessTreeKillAttempt: 'True'
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] OOM score adjustment is Linux-only.
[2024-07-02 12:34:01Z INFO ProcessInvokerWrapper] Process started with process id 8552, waiting for process exit.
kirill-ivlev commented 1 month ago

Hi @ovaar, thanks for reporting! We are working on more prioritized issues at the moment, but will get back to this one soon.

aleksandrlevochkin commented 1 week ago

Hi @ovaar! I tried to reproduce this issue using a Python script that executes a long-running task, but did not have success both with the agent version v3.236.1 and the latest one. Could you please provide more details on what your build.py script looks like? Also, it would be great if you could provide the Worker file's log for a pipeline run with system.debug variable enabled