Closed r-c-n closed 8 months ago
Additional case:
baseline test suite with no children nodes. Result is inconclusive (all test suite nodes seem to have result=null
). No test log because test logs are apparently stored in the test case nodes (leaf nodes) but not in the test suite nodes:
https://staging.kernelci.org:9000/viewer?node_id=65c083e76c0c48387d4582bc
{
"id": "65c083e76c0c48387d4582bc",
"kind": "test",
"name": "baseline-x86-board",
"path": [
"checkout",
"kbuild-gcc-10-x86-board",
"baseline-x86-board"
],
"group": "baseline-x86-board",
"parent": "65c082286c0c48387d45814c",
"state": "done",
"result": null,
"artifacts": {
"build_log": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"build_kimage_stdout": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build_kimage.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"build_kernel_errors": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build_kimage_stderr.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"build_modules_stdout": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build_modules.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"build_modules_errors": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/build_modules_stderr.log?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"metadata": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/metadata.json?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"fragments/0_config": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/fragments/0.config?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"fragments/1_config": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/fragments/1.config?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"kernel": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/bzImage?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D",
"modules": "https://kciapistagingstorage1.file.core.windows.net/staging/kbuild-gcc-10-x86-board-65c082286c0c48387d45814c/modules.tar.xz?sv=2022-11-02&ss=f&srt=sco&sp=r&se=2024-10-17T19:19:12Z&st=2023-10-17T11:19:12Z&spr=https&sig=sLmFlvZHXRrZsSGubsDUIvTiv%2BtzgDq6vALfkrtWnv8%3D"
},
"data": {
"kernel_revision": {
"branch": "linux-4.19.y",
"commit": "6e0ebbf58f17f70ad5cfabc0a8e7e52b956ee241",
"describe": "v4.19.306-146-g6e0ebbf58f17f",
"tree": "stable-rc",
"url": "https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git",
"version": {
"extra": "-146-g6e0ebbf58f17f",
"patchlevel": "19",
"sublevel": "306",
"version": "4"
}
},
"runtime": "lava-collabora",
"platform": "acer-cb317-1h-c3z6-dedede",
"job_id": "12701747"
},
"created": "2024-02-05T06:44:55.956000",
"updated": "2024-02-05T12:46:15.151000",
"timeout": "2024-02-05T12:44:55.956000",
"holdoff": null,
"owner": "staging.kernelci.org",
"user_groups": []
}
Lava job output shows that Linux failed to boot and the job timed out: https://lava.collabora.dev/scheduler/job/12701747
...
Exiting depthcharge with code 4 at timestamp: 39807273
Starting kernel ...
end: 2.2.4 bootloader-commands (duration 00:00:33) [common]
start: 2.2.5 auto-login-action (timeout 00:04:14) [common]
Setting prompt string to ['Linux version [0-9]']
Setting prompt string to ['Linux version [0-9]', 'Tftp failed.', 'Dhcp release failed.', 'No space left for ramdisk', 'TFTP failed for ramdisk.', 'Dhcp release failed.', 'Out of space adding TFTP server IP to the command line.', 'No network device.', 'Error code \\d+( \\([\\w\\s]+\\))?\\r\\n', '(Bulk read error(.*)){3}']
auto-login-action: Wait for prompt ['Linux version [0-9]', 'Tftp failed.', 'Dhcp release failed.', 'No space left for ramdisk', 'TFTP failed for ramdisk.', 'Dhcp release failed.', 'Out of space adding TFTP server IP to the command line.', 'No network device.', 'Error code \\d+( \\([\\w\\s]+\\))?\\r\\n', '(Bulk read error(.*)){3}'] (timeout 00:05:00)
auto-login-action timed out after 254 seconds
end: 2.2.5 auto-login-action (duration 00:04:14) [common]
case: auto-login-action
case_id: 452508863
definition: lava
duration: 254.00
extra: ...
level: 2.2.5
namespace: common
result: fail
I found another inconsistent behavior wrt test results. In particular about the result of test suites:
tast
test group has result = pass
: https://staging.kernelci.org:9000/viewer?node_id=65e872bb82fa13fcfb086922
and some of its children test nodes failed: https://staging.kernelci.org:9000/viewer?search=parent%3D65e872bb82fa13fcfb086922
However, this other test group (baseline-armel
) has result = fail
https://staging.kernelci.org:9000/viewer?node_id=65e84dbb90557b9f467c205e when one of its children test nodes has failed: https://staging.kernelci.org:9000/viewer?search=parent%3D65e84dbb90557b9f467c205e
I don't know if this is caused by a lack of consensus or because there's a bug somewhere. @nuclearcat @JenySadadia @a-wai is there a clear rule about what should be the result of a test suite depending on the result of its children test cases?
PRETTY URGENT, I'd say.
I don't have a preference, I can make the regression-related code adapt to the behavior whatever it is, as long as it's consistent. Until now I was ignoring the results of test suite nodes because they were always pass
and I was focusing only on test case results so that regressions were generated individually for the greater granularity case (test cases instead of test suites). Since we don't have a specific model for test suites yet, I'd go back to that just to simplify things. But whatever we choose to do, we have to settle on the semantics of that result
means for a test suite node.
Hello, We are not setting parent node result based on child results. The result has been set based on command/process return code.
Please check below links:
for kver
job: https://github.com/kernelci/kernelci-pipeline/blob/main/config/runtime/kver.jinja2#L69,
for kunit
job: https://github.com/kernelci/kernelci-pipeline/blob/main/config/runtime/kunit.jinja2#L61,
for kbuild
job: https://github.com/kernelci/kernelci-core/blob/main/kernelci/kbuild.py#L783
tast test group has result = pass: https://staging.kernelci.org:9000/viewer?node_id=65e872bb82fa13fcfb086922 and some of its children test nodes failed: https://staging.kernelci.org:9000/viewer?search=parent%3D65e872bb82fa13fcfb086922
The result has been set from associated lava job result https://lava.collabora.dev/scheduler/job/12960001:
case: 0_tast
case_id: 463745830
definition: lava
duration: 168.95
namespace: chromeos
path: inline/cros-tast.yaml
metadata: {'format': 'Lava-Test Test Definition 1.0', 'name': 'cros-tast'}
run: {'steps': ['cd /home/cros', "lava-test-case tast-tarball --shell curl -s 'https://storage.chromeos.kernelci.org/images/rootfs/chromeos/chromiumos-puff/20240129.0/amd64/tast.tgz' \\| tar xzvf - && cp remote_test_runner /usr/bin/remote_test_runner && mkdir -p /usr/libexec/tast/bundles/remote/ && cp cros /usr/libexec/tast/bundles/remote/", 'while ! ping -c 1 -w 1 $(lava-target-ip); do sleep 1; done', 'lava-test-case os-release --shell ./ssh_retry.sh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/cros/.ssh/id_rsa root@$(lava-target-ip) cat /etc/os-release', './tast_parser.py video.ChromeStackDecoder.* video.ChromeStackDecoderVerification.*', './ssh_retry.sh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -i /home/cros/.ssh/id_rsa root@$(lava-target-ip) sync && poweroff && exit 0']}
result: pass
revision: unspecified
uuid: 12960001_6.2.3.1
However, this other test group (baseline-armel) has result = fail https://staging.kernelci.org:9000/viewer?node_id=65e84dbb90557b9f467c205e when one of its children test nodes has failed: https://staging.kernelci.org:9000/viewer?search=parent%3D65e84dbb90557b9f467c205e
Again, node result for baseline-armel
is set fail
as the job status is incomplete
and not based on the child node result. See the job: https://lava.collabora.dev/scheduler/job/12959192
Aah, thanks for clarifying! Phew, that makes it all easier to handle. Great!
Current baseline tests show inconsistent results and logs.
Context
at least seen in baseline-arm64 tests on stable-rc tree, run by Lava. Could be a generalized problem.
Examples
result='pass'
that don't show a healthy Linux boot in the log: https://staging.kernelci.org:9000/viewer?node_id=65beae243ff676efede404e3result='pass'
that don't have a log at all (log is empty): https://staging.kernelci.org:9000/viewer?node_id=65beae253ff676efede404e4result='pass'
, but its only test case (login
) failed: https://staging.kernelci.org:9000/viewer?node_id=65beae253ff676efede404e4