Inconsistent baseline test results

r-c-n commented 9 months ago

Current baseline tests show inconsistent results and logs.

Context

at least seen in baseline-arm64 tests on stable-rc tree, run by Lava. Could be a generalized problem.

Examples

Baseline jobs with result='pass' that don't show a healthy Linux boot in the log: https://staging.kernelci.org:9000/viewer?node_id=65beae243ff676efede404e3
Jobs with result='pass' that don't have a log at all (log is empty): https://staging.kernelci.org:9000/viewer?node_id=65beae253ff676efede404e4
Test suite with result='pass', but its only test case (login) failed: https://staging.kernelci.org:9000/viewer?node_id=65beae253ff676efede404e4

r-c-n commented 9 months ago

Additional case:

baseline test suite with no children nodes. Result is inconclusive (all test suite nodes seem to have result=null). No test log because test logs are apparently stored in the test case nodes (leaf nodes) but not in the test suite nodes:

https://staging.kernelci.org:9000/viewer?node_id=65c083e76c0c48387d4582bc

{
  "id": "65c083e76c0c48387d4582bc",
  "kind": "test",
  "name": "baseline-x86-board",
  "path": [
    "checkout",
    "kbuild-gcc-10-x86-board",
    "baseline-x86-board"
  ],
  "group": "baseline-x86-board",
  "parent": "65c082286c0c48387d45814c",
  "state": "done",
  "result": null,
  "artifacts": {
    "build_log": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "build_kimage_stdout": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build_kimage.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "build_kernel_errors": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build_kimage_stderr.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "build_modules_stdout": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build_modules.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "build_modules_errors": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build_modules_stderr.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "metadata": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/metadata.json?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "fragments/0_config": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/fragments/0.config?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "fragments/1_config": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/fragments/1.config?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "kernel": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/bzImage?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
    "modules": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/modules.tar.xz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D"
  },
  "data": {
    "kernel_revision": {
      "branch": "linux-4.19.y",
      "commit": "6e0ebbf58f17f70ad5cfabc0a8e7e52b956ee241",
      "describe": "v4.19.306-146-g6e0ebbf58f17f",
      "tree": "stable-rc",
      "url": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git",
      "version": {
        "extra": "-146-g6e0ebbf58f17f",
        "patchlevel": "19",
        "sublevel": "306",
        "version": "4"
      }
    },
    "runtime": "lava-collabora",
    "platform": "acer-cb317-1h-c3z6-dedede",
    "job_id": "12701747"
  },
  "created": "2024-02-05T06:44:55.956000",
  "updated": "2024-02-05T12:46:15.151000",
  "timeout": "2024-02-05T12:44:55.956000",
  "holdoff": null,
  "owner": "staging.kernelci.org",
  "user_groups": []
}

Lava job output shows that Linux failed to boot and the job timed out: https://lava.collabora.dev/scheduler/job/12701747

...
Exiting depthcharge with code 4 at timestamp: 39807273
Starting kernel ...
end: 2.2.4 bootloader-commands (duration 00:00:33) [common]
start: 2.2.5 auto-login-action (timeout 00:04:14) [common]
Setting prompt string to ['Linux version [0-9]']
Setting prompt string to ['Linux version [0-9]', 'Tftp failed.', 'Dhcp release failed.', 'No space left for ramdisk', 'TFTP failed for ramdisk.', 'Dhcp release failed.', 'Out of space adding TFTP server IP to the command line.', 'No network device.', 'Error code \\d+( \\([\\w\\s]+\\))?\\r\\n', '(Bulk read error(.*)){3}']
auto-login-action: Wait for prompt ['Linux version [0-9]', 'Tftp failed.', 'Dhcp release failed.', 'No space left for ramdisk', 'TFTP failed for ramdisk.', 'Dhcp release failed.', 'Out of space adding TFTP server IP to the command line.', 'No network device.', 'Error code \\d+( \\([\\w\\s]+\\))?\\r\\n', '(Bulk read error(.*)){3}'] (timeout 00:05:00)
auto-login-action timed out after 254 seconds
end: 2.2.5 auto-login-action (duration 00:04:14) [common]
case: auto-login-action
case_id: 452508863
definition: lava
duration: 254.00
extra: ...
level: 2.2.5
namespace: common
result: fail

JenySadadia commented 8 months ago

The issue covers https://github.com/kernelci/kernelci-api/issues/470 and https://github.com/kernelci/kernelci-pipeline/issues/451

r-c-n commented 8 months ago

I found another inconsistent behavior wrt test results. In particular about the result of test suites:

tast test group has result = pass: https://staging.kernelci.org:9000/viewer?node_id=65e872bb82fa13fcfb086922 and some of its children test nodes failed: https://staging.kernelci.org:9000/viewer?search=parent%3D65e872bb82fa13fcfb086922

However, this other test group (baseline-armel) has result = fail https://staging.kernelci.org:9000/viewer?node_id=65e84dbb90557b9f467c205e when one of its children test nodes has failed: https://staging.kernelci.org:9000/viewer?search=parent%3D65e84dbb90557b9f467c205e

I don't know if this is caused by a lack of consensus or because there's a bug somewhere. @nuclearcat @JenySadadia @a-wai is there a clear rule about what should be the result of a test suite depending on the result of its children test cases?

PRETTY URGENT, I'd say.

I don't have a preference, I can make the regression-related code adapt to the behavior whatever it is, as long as it's consistent. Until now I was ignoring the results of test suite nodes because they were always pass and I was focusing only on test case results so that regressions were generated individually for the greater granularity case (test cases instead of test suites). Since we don't have a specific model for test suites yet, I'd go back to that just to simplify things. But whatever we choose to do, we have to settle on the semantics of that result means for a test suite node.

JenySadadia commented 8 months ago

Hello, We are not setting parent node result based on child results. The result has been set based on command/process return code.

Please check below links: for kver job: https://github.com/kernelci/kernelci-pipeline/blob/main/config/runtime/kver.jinja2#L69, for kunit job: https://github.com/kernelci/kernelci-pipeline/blob/main/config/runtime/kunit.jinja2#L61, for kbuild job: https://github.com/kernelci/kernelci-core/blob/main/kernelci/kbuild.py#L783

tast test group has result = pass: https://staging.kernelci.org:9000/viewer?node_id=65e872bb82fa13fcfb086922 and some of its children test nodes failed: https://staging.kernelci.org:9000/viewer?search=parent%3D65e872bb82fa13fcfb086922

The result has been set from associated lava job result https://lava.collabora.dev/scheduler/job/12960001:

case: 0_tast
case_id: 463745830
definition: lava
duration: 168.95
namespace: chromeos
path: inline/cros-tast.yaml
metadata: {'format': 'Lava-Test Test Definition 1.0', 'name': 'cros-tast'}
run: {'steps': ['cd /home/cros', "lava-test-case tast-tarball --shell curl -s 'https://storage.chromeos.kernelci.org/images/rootfs/chromeos/chromiumos-puff/20240129.0/amd64/tast.tgz' \\| tar xzvf - && cp remote_test_runner /usr/bin/remote_test_runner && mkdir -p /usr/libexec/tast/bundles/remote/ && cp cros /usr/libexec/tast/bundles/remote/", 'while ! ping -c 1 -w 1 $(lava-target-ip); do sleep 1; done', 'lava-test-case os-release --shell ./ssh_retry.sh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/cros/.ssh/id_rsa root@$(lava-target-ip) cat /etc/os-release', './tast_parser.py video.ChromeStackDecoder.* video.ChromeStackDecoderVerification.*', './ssh_retry.sh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/cros/.ssh/id_rsa root@$(lava-target-ip) sync && poweroff && exit 0']}
result: pass
revision: unspecified
uuid: 12960001_6.2.3.1

However, this other test group (baseline-armel) has result = fail https://staging.kernelci.org:9000/viewer?node_id=65e84dbb90557b9f467c205e when one of its children test nodes has failed: https://staging.kernelci.org:9000/viewer?search=parent%3D65e84dbb90557b9f467c205e

Again, node result for baseline-armel is set fail as the job status is incomplete and not based on the child node result. See the job: https://lava.collabora.dev/scheduler/job/12959192

r-c-n commented 8 months ago

Aah, thanks for clarifying! Phew, that makes it all easier to handle. Great!

kernelci / kernelci-api

Inconsistent baseline test results #464

Context

Examples