mkorpela / pabot

Parallel executor for Robot Framework test cases.
https://pabot.org
Apache License 2.0
476 stars 152 forks source link

Pabot and Run Process: Random failure and robot_stderr.out says killed #516

Closed chidambaranathan-r closed 1 year ago

chidambaranathan-r commented 1 year ago

Team,

We are using "pabot" to parallelize at test case level. Below command for reference:

pabot --pabotlib --testlevelsplit --processes ${PARALLEL_PROCESSES} \
    --verbose --loglevel DEBUG:INFO \
    --outputdir "${ROBOT_REPORTS_DIR}/${ROBOT_TEST_SUITE_REPORT_DIR}" \
    $([ -n "${include_tag}" ] && echo "${include_tag}") \
    $([ -n "${exclude_tag}" ] && echo "${exclude_tag}") \
    "${ROBOT_TESTS_DIR}/${ROBOT_TEST_SUITE_TO_RUN}"

In one of the higher level keyword (that will be consumed by one of the test case), we are using "Run Process" keyword to download a file from a path and use it internally.

Run Process curl -k --header "X-JFrog-Art-Api:%{REPO_API_KEY}" --fail <path_to_file>/${archivefile} -o ${archivefile} && ls -lrth ${archivefile} shell=True

Entire higher level keyword looks like the below:

    [Arguments]    ${username}    ${password}    ${archivefile}    ${app_id}=""    ${expected_status_code}=200
    Run Process    curl -k --header "X-JFrog-Art-Api:%{REPO_API_KEY}" --fail <path_to_file>/${archivefile} -o ${archivefile} && ls -lrth ${archivefile}   shell=True
    ${fileData}=    Get Binary File  ${archivefile}
    &{fileParts}=  Create Dictionary
    Set To Dictionary  ${fileParts}  image=${fileData}
    create app session    username=${username}    password=${password} 
    POST On Session    appsession    ${app_endpoint}    files=${fileParts}    expected_status=${expected_status_code}

When this keyword is called, sometimes it fails when the "Run Process" keyword is invoked. This is a random behaviour. And we do get the entire robot report in green background, except the failed case (in fact it is missing in the report).

In the "pabot_results" folder, "robot_stderr.out" for this particular test case says: "Killed"

As we are invoking the test automation in Jenkins, Jenkins logs says the below:

[2023-02-13T13:14:55.989Z] ++ testExitCode=252
[2023-02-13T13:14:55.989Z] ++ [[ -n 252 ]]
[2023-02-13T13:14:55.989Z] ++ [[ 252 -ne 0 ]]
[2023-02-13T13:14:55.989Z] ++ echo -e 'Robot command failed with exit code 252'
[2023-02-13T13:14:55.989Z] ++ exit 252
[2023-02-13T13:14:55.989Z] Robot command failed with exit code 252

We are running the robot test cases inside a container (we built our own container image that includes robot framework and other required libraries). This container is spined in the Jenkins pipeline in one of the stages and all the test cases are ran inside the container.

While the "killed" is seen in "robot_stderr.out", we inspected the jenkins machine in which the container is running, and we observe the below logs:

Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492762] robot invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492765] CPU: 2 PID: 2906382 Comm: robot Not tainted 5.4.0-135-generic #152-Ubuntu
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492766] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492767] Call Trace:
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492776]  dump_stack+0x6d/0x8b
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492782]  dump_header+0x4f/0x1eb
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492784]  oom_kill_process.cold+0xb/0x10
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.492788]  out_of_memory+0x1cf/0x500
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.493057] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=28d71069fb462403900e1e5d7d0d36d7159c6653a67a31d525d2993c37fd0e39,mems_allowed=0,global_oom,task_memcg=/system.slice/netdata.service,task=netdata,pid=2499611,uid=112
Feb 14 10:39:51 node-10-210-174-200 kernel: [4666225.493152] Out of memory: Killed process 2499611 (netdata) total-vm:593480kB, anon-rss:111916kB, file-rss:0kB, shmem-rss:0kB, UID:112 pgtables:564kB oom_score_adj:1000
Feb 14 10:39:55 node-10-210-174-200 kernel: [4666229.989867] robot invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

We were wondering what the issue could be (We got a clue that it is something to do with "Run Process" and "Pabot"). It would be really helpful, if someone can guide us on debugging this.

Versions:

robotframework                  4.0.3
robotframework-pabot            2.0.0
chidambaranathan-r commented 1 year ago

We identified that this is due to high memory usage of "Get Binary File" keyword Tracked here: https://github.com/robotframework/robotframework/issues/4658 Hence closing this thread.