Open jakubgs opened 10 months ago
This has been going on for a while - at least since June - as I've seen these when debugging linux builds being killed:
There's also general protection fault
errors, but less frequent:
admin@linux-01.he-eu-hel1.ci.devel:~ % sudo dmesg | grep 'general protection fault'
[11549645.775740] traps: QQmlThread[1141503] general protection fault ip:7f7fbe492c30 sp:7f7fb3ffde90 error:0 in libQt5Qml.so.5.15.2[7f7fbe200000+45e000]
[11601779.978722] traps: QQmlThread[3166153] general protection fault ip:7fb21fa92c30 sp:7fb0337f8f20 error:0 in libQt5Qml.so.5.15.2[7fb21f800000+45e000]
I will attempt to audit the processes to find exactly what spawns them to confirm it's E2E tests.
Most probably using auditd
or bpfcc-tools
.
Looks like sing execsnoop
from bpfcc-tools
doesn't cut it. For some reason I'm getting no results:
TIME TIME(s) UID PCOMM PID PPID RET ARGS
Possibly lost 55 samples
Possibly lost 2 samples
Possibly lost 12 samples
Possibly lost 14 samples
Possibly lost 5 samples
Possibly lost 165 samples
...
I started these commands on linux-01
and linux-05
in a tmux
session yesterday:
execsnoop-bpfcc -x -t -T -U -u jenkins -n nim_status_client
But I got nothing. Probably using auditd
will work out better.
I added an audit filter using:
sudo auditctl -a task,always -F uid=jenkins
Lets see what we can find.
Looks like I used a wrong filter because I got nothing:
admin@linux-01.he-eu-hel1.ci.devel:~ % sudo ausearch -i -sc execve
<no matches>
These segfaults are not that frequent, one or two per day:
admin@linux-02.he-eu-hel1.ci.devel:~ % sudo dmesg -T | grep segfault
[Fri Nov 17 13:15:39 2023] QQmlThread[2932696]: segfault at 15 ip 00007f16ab692c30 sp 00007f14dd3f9f20 error 4 in libQt5Qml.so.5.15.2[7f16ab400000+45e000]
[Fri Nov 17 17:13:14 2023] nim_status_clie[772385]: segfault at 20 ip 00007f056d86dc1e sp 00007ffcd71cab80 error 4 in libgcc_s.so.1[7f056d85b000+17000]
[Mon Nov 20 15:38:25 2023] QQmlThread[1402142]: segfault at 640065038f ip 00007f37a16c9d93 sp 00007f35ba3f9ef0 error 4 in libQt5Qml.so.5.15.2[7f37a1600000+45e000]
[Tue Nov 21 14:15:34 2023] QQmlThread[49389]: segfault at 7f4a3044b214 ip 00007f4dd6092c30 sp 00007f4dcbbfde90 error 4 in libQt5Qml.so.5.15.2[7f4dd5e00000+45e000]
[Tue Nov 21 18:18:22 2023] QQmlThread[1415890]: segfault at 0 ip 0000000000000000 sp 00007f826a3f9f38 error 14
[Wed Nov 22 11:19:09 2023] QQmlThread[2686601]: segfault at 44 ip 00007f4697492c30 sp 00007f44c13f9f20 error 4 in libQt5Qml.so.5.15.2[7f4697200000+45e000]
All Linux CI hosts now have an audit rule for new processes:
> ansible ci-slave-linux -o -a 'auditctl -l'
linux-01.he-eu-hel1.ci.devel | CHANGED | rc=0 | (stdout) -a always,task
linux-02.he-eu-hel1.ci.devel | CHANGED | rc=0 | (stdout) -a always,task
linux-03.he-eu-hel1.ci.devel | CHANGED | rc=0 | (stdout) -a always,task
linux-04.he-eu-hel1.ci.devel | CHANGED | rc=0 | (stdout) -a always,task
linux-05.he-eu-hel1.ci.devel | CHANGED | rc=0 | (stdout) -a always,task
It looks like I can't use auditd
to catch what processes are segfaulting because it absolutely murdered I/O on CI hosts:
So I uninstalled it.
Managed to catch some examples:
[Mon Nov 27 10:33:24 2023] nim_status_clie[1445035]: segfault at 8 ip 00007f923fc62250 sp 00007ffddf46f9b8 error 4 in libQt5Qml.so.5[7f923fa00000+45e000]
admin@linux-04.he-eu-hel1.ci.devel:~ % grep 1445035 snoop.log
10:33:20 76378.2541001 startaut 1445035 1445034 0 /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103320_056118 --LOG_LEVEL=DEBUG
10:33:20 76378.2551001 StatusIm-Deskto 1445035 1445034 0 tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103320_056118 --LOG_LEVEL=DEBUG
10:33:20 76378.2621001 AppRun 1445035 1445034 0 /tmp/.mount_StatushwHnNZ/AppRun -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103320_056118 --LOG_LEVEL=DEBUG
10:33:20 76378.2671001 dirname 1445041 1445035 0 /usr/bin/dirname /tmp/.mount_StatushwHnNZ/AppRun
10:33:20 76378.2711001 nim_status_clie 1445035 1445034 0 /tmp/.mount_StatushwHnNZ/usr/bin/nim_status_client -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103320_056118 --LOG_LEVEL=DEBUG
It appears this one came from /home/jenkins/workspace/status-desktop/e2e/prs@3/
.
It appears to be this job: https://ci.status.im/job/status-desktop/job/e2e/job/prs/634/ Which failed with:
hair pipe affair cage vote estate gloom lamp robust like-user_account0] _
11:36:37
11:36:37 aut_id = 'AUT_095742', timeout_sec = 10
11:36:37
11:36:37 @allure.step('Attaching to "{0}"')
11:36:37 def attach(aut_id: str, timeout_sec: int = configs.timeouts.PROCESS_TIMEOUT_SEC):
11:36:37 started_at = time.monotonic()
11:36:37 _logger.debug(f'Attaching to {aut_id}')
11:36:37 while True:
11:36:37 try:
11:36:37 > context = squish.attachToApplication(aut_id, SquishServer().host, SquishServer().port)
11:36:37 E RuntimeError: Application 'AUT_095742' could not be started. Received invalid application id.
11:36:37
11:36:37 driver/context.py:19: RuntimeError
Also:
11:36:37 > context = squish.attachToApplication(aut_id, SquishServer().host, SquishServer().port)
11:36:37 E RuntimeError: Timeout expired during handshake
The app crashed at 10:33:24
and the tests started at 10:32:20
:
10:32:20 + python3 -m pytest --disable-warnings --alluredir=./allure-results
Though the actual execution time of the app is 10:32:59
:
10:32:59 09:32:58.874 INFO scripts.utils.local_system Execute: "/opt/squish-runner-7.1-20230222-1555/bin/squishserver" --configfile /home/jenkins/workspace/status-desktop/e2e/prs/squish_server.ini --config addAttachableAUT AUT_093221 localhost:62134
10:32:59 09:32:58.882 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62134 "tmp/StatusIm-Desktop-231127-091406-64a0bc-pr12866-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs/tmp/results/run_27112023_093220/status/app_093221_426178 --LOG_LEVEL=DEBUG
10:33:05 09:33:04.450 INFO driver.context AUT: AUT_093221 attache
Which does fit very closely with the segmentation fault above.
Although I'm not 100% convinced as the filter doesn't match in this job:
Running on linux-04 in /home/jenkins/workspace/status-desktop/e2e/prs
This one seems to fit better: https://ci.status.im/job/status-desktop/job/e2e/job/prs/638/console
Running on linux-04 in /home/jenkins/workspace/status-desktop/e2e/prs@3
And so does the startup time:
11:28:52 + python3 -m pytest --disable-warnings --alluredir=./allure-results
...
12:58:49 Cancelling nested steps due to timeout
12:58:49 Sending interrupt signal to process
12:59:09 After 20s process did not stop
Looks like this is one of our favorite timeouts.
And this run matches very well will the multiple sefaults in a row:
[Mon Nov 27 10:28:58 2023] nim_status_clie[1439693]: segfault at 8 ip 00007ff6bfa62250 sp 00007ffd15d74558 error 4 in libQt5Qml.so.5[7ff6bf800000+45e000]
[Mon Nov 27 10:29:09 2023] nim_status_clie[1439981]: segfault at 8 ip 00007f7a2e862250 sp 00007ffcf58ff048 error 4 in libQt5Qml.so.5[7f7a2e600000+45e000]
[Mon Nov 27 10:29:20 2023] nim_status_clie[1440154]: segfault at 8 ip 00007f92acc62250 sp 00007ffd7e91e668 error 4 in libQt5Qml.so.5[7f92aca00000+45e000]
[Mon Nov 27 10:29:32 2023] nim_status_clie[1440322]: segfault at 8 ip 00007fe6de062250 sp 00007fff62f32558 error 4 in libQt5Qml.so.5[7fe6dde00000+45e000]
[Mon Nov 27 10:29:43 2023] nim_status_clie[1440616]: segfault at 8 ip 00007f6193262250 sp 00007ffce667cfe8 error 4 in libQt5Qml.so.5[7f6193000000+45e000]
[Mon Nov 27 10:29:54 2023] nim_status_clie[1440790]: segfault at 8 ip 00007f3e03462250 sp 00007ffcbd04df28 error 4 in libQt5Qml.so.5[7f3e03200000+45e000]
[Mon Nov 27 10:30:05 2023] nim_status_clie[1440951]: segfault at 8 ip 00007f5949e62250 sp 00007ffed150fee8 error 4 in libQt5Qml.so.5[7f5949c00000+45e000]
[Mon Nov 27 10:30:16 2023] nim_status_clie[1441353]: segfault at 8 ip 00007fab86862250 sp 00007ffcd6a1edb8 error 4 in libQt5Qml.so.5[7fab86600000+45e000]
[Mon Nov 27 10:30:27 2023] nim_status_clie[1441561]: segfault at 8 ip 00007ffaba062250 sp 00007ffe1d186ae8 error 4 in libQt5Qml.so.5[7ffab9e00000+45e000]
[Mon Nov 27 10:30:48 2023] nim_status_clie[1441895]: segfault at 8 ip 00007fc06a862250 sp 00007ffdcffb1ac8 error 4 in libQt5Qml.so.5[7fc06a600000+45e000]
[Mon Nov 27 10:31:00 2023] nim_status_clie[1442187]: segfault at 8 ip 00007f9355c62250 sp 00007ffe46a1e558 error 4 in libQt5Qml.so.5[7f9355a00000+45e000]
[Mon Nov 27 10:31:11 2023] nim_status_clie[1442358]: segfault at 8 ip 00007fc187662250 sp 00007fffd7537ce8 error 4 in libQt5Qml.so.5[7fc187400000+45e000]
[Mon Nov 27 10:31:22 2023] nim_status_clie[1442525]: segfault at 8 ip 00007ff77fe62250 sp 00007ffd49c48c38 error 4 in libQt5Qml.so.5[7ff77fc00000+45e000]
[Mon Nov 27 10:31:33 2023] nim_status_clie[1442820]: segfault at 8 ip 00007f1443462250 sp 00007ffe6f03d988 error 4 in libQt5Qml.so.5[7f1443200000+45e000]
[Mon Nov 27 10:31:44 2023] nim_status_clie[1442969]: segfault at 8 ip 00007efec8a62250 sp 00007ffe6a3c0758 error 4 in libQt5Qml.so.5[7efec8800000+45e000]
[Mon Nov 27 10:31:55 2023] nim_status_clie[1443158]: segfault at 8 ip 00007f06f2662250 sp 00007fff44170d08 error 4 in libQt5Qml.so.5[7f06f2400000+45e000]
[Mon Nov 27 10:32:06 2023] nim_status_clie[1443432]: segfault at 8 ip 00007f97e1062250 sp 00007ffc19cc5e48 error 4 in libQt5Qml.so.5[7f97e0e00000+45e000]
[Mon Nov 27 10:32:17 2023] nim_status_clie[1443624]: segfault at 8 ip 00007f0fd8662250 sp 00007ffde7388eb8 error 4 in libQt5Qml.so.5[7f0fd8400000+45e000]
[Mon Nov 27 10:32:29 2023] nim_status_clie[1443932]: segfault at 8 ip 00007fc905662250 sp 00007ffd84d76bb8 error 4 in libQt5Qml.so.5[7fc905400000+45e000]
[Mon Nov 27 10:32:40 2023] nim_status_clie[1444078]: segfault at 8 ip 00007f7f26862250 sp 00007fffbc489108 error 4 in libQt5Qml.so.5[7f7f26600000+45e000]
[Mon Nov 27 10:32:51 2023] nim_status_clie[1444384]: segfault at 8 ip 00007ffb71462250 sp 00007ffe2e490958 error 4 in libQt5Qml.so.5[7ffb71200000+45e000]
[Mon Nov 27 10:33:02 2023] nim_status_clie[1444569]: segfault at 8 ip 00007f45b2c62250 sp 00007ffccc7df4d8 error 4 in libQt5Qml.so.5[7f45b2a00000+45e000]
[Mon Nov 27 10:33:13 2023] nim_status_clie[1444781]: segfault at 8 ip 00007fe55b662250 sp 00007ffc9f567978 error 4 in libQt5Qml.so.5[7fe55b400000+45e000]
[Mon Nov 27 10:33:24 2023] nim_status_clie[1445035]: segfault at 8 ip 00007f923fc62250 sp 00007ffddf46f9b8 error 4 in libQt5Qml.so.5[7f923fa00000+45e000]
Here we go:
10:28:53.964 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_102853_845657 --LOG_LEVEL=DEBUG
10:29:05.065 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_102853_845657 --LOG_LEVEL=DEBUG
10:29:16.309 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_102853_845657 --LOG_LEVEL=DEBUG
10:29:27.450 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_102927_320435 --LOG_LEVEL=DEBUG
10:29:38.559 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_102927_320435 --LOG_LEVEL=DEBUG
10:29:49.744 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_102927_320435 --LOG_LEVEL=DEBUG
10:30:00.843 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103000_719513 --LOG_LEVEL=DEBUG
10:30:11.799 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103000_719513 --LOG_LEVEL=DEBUG
10:30:22.916 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103000_719513 --LOG_LEVEL=DEBUG
10:30:33.979 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103033_844503 --LOG_LEVEL=DEBUG
10:30:44.371 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103033_844503 --LOG_LEVEL=DEBUG
10:30:55.357 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103033_844503 --LOG_LEVEL=DEBUG
10:31:06.793 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103106_675530 --LOG_LEVEL=DEBUG
10:31:17.841 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103106_675530 --LOG_LEVEL=DEBUG
10:31:28.805 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103106_675530 --LOG_LEVEL=DEBUG
10:31:40.097 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103139_973454 --LOG_LEVEL=DEBUG
10:31:51.146 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103139_973454 --LOG_LEVEL=DEBUG
10:32:02.166 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103139_973454 --LOG_LEVEL=DEBUG
10:32:13.388 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103213_266041 --LOG_LEVEL=DEBUG
10:32:24.423 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103213_266041 --LOG_LEVEL=DEBUG
10:32:35.578 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103213_266041 --LOG_LEVEL=DEBUG
10:32:46.675 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103246_551419 --LOG_LEVEL=DEBUG
10:32:57.906 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103246_551419 --LOG_LEVEL=DEBUG
10:33:08.919 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103246_551419 --LOG_LEVEL=DEBUG
10:33:20.177 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103320_056118 --LOG_LEVEL=DEBUG
This is the app being tested: https://ci.status.im/job/status-desktop/job/prs/job/linux/job/x86_64/job/package/job/PR-12856/7/
I attempted to run this AppImage locally and indeed it segfaults:
Traceback (most recent call last)
/home/jenkins/workspace/rs_linux_x86_64_package_PR-12856@2/src/nim_status_client.nim(208) nim_status_client
/home/jenkins/workspace/rs_linux_x86_64_package_PR-12856@2/src/nim_status_client.nim(181) mainProc
/home/jenkins/workspace/rs_linux_x86_64_package_PR-12856@2/src/app/boot/app_controller.nim(298) delete
/home/jenkins/workspace/rs_linux_x86_64_package_PR-12856@2/src/app/modules/startup/io_interface.nim(19) delete
/home/jenkins/workspace/rs_linux_x86_64_package_PR-12856@2/src/app/modules/startup/module.nim(87) delete
/home/jenkins/workspace/rs_linux_x86_64_package_PR-12856@2/vendor/nimqml/src/nimqml/private/qqmlapplicationengine.nim(29) setRootContextProperty
SIGSEGV: Illegal storage access. (Attempt to read from nil?)
Commented on the PR: https://github.com/status-im/status-desktop/pull/12856
The fact that we do not log ANYTHING about how the app crashed when being started is a major issue:
11:30:01 -------------------------------- live log setup --------------------------------
11:30:01 10:30:00.836 INFO scripts.utils.local_system Execute: "/opt/squish-runner-7.1-20230222-1555/bin/squishserver" --configfile /home/jenkins/workspace/status-desktop/e2e/prs@3/squish_server.ini --config addAttachableAUT AUT_103000 localhost:62138
11:30:01 10:30:00.843 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103000_719513 --LOG_LEVEL=DEBUG
11:30:12 10:30:11.794 INFO scripts.utils.local_system Execute: "/opt/squish-runner-7.1-20230222-1555/bin/squishserver" --configfile /home/jenkins/workspace/status-desktop/e2e/prs@3/squish_server.ini --config addAttachableAUT AUT_103000 localhost:62138
11:30:12 10:30:11.799 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103000_719513 --LOG_LEVEL=DEBUG
11:30:24 10:30:22.906 INFO scripts.utils.local_system Execute: "/opt/squish-runner-7.1-20230222-1555/bin/squishserver" --configfile /home/jenkins/workspace/status-desktop/e2e/prs@3/squish_server.ini --config addAttachableAUT AUT_103000 localhost:62138
11:30:24 10:30:22.916 INFO scripts.utils.local_system Execute: /opt/squish-runner-7.1-20230222-1555/bin/startaut --port=62138 "tmp/StatusIm-Desktop-231127-100907-44682e-pr12856-x86_64.AppImage" -d=/home/jenkins/workspace/status-desktop/e2e/prs@3/tmp/results/run_27112023_102853/status/app_103000_719513 --LOG_LEVEL=DEBUG
11:30:35 ERROR
All we get are 3 retries and then just ERROR
, which is useless.
The fact that none of this was logged in the E2E run means that there's severe issues in this file: https://github.com/status-im/desktop-qa-automation/blob/master/driver/aut.py
I needs major refactoring to address several issues:
configs.ATTACH_MODE
.This is still an issue, we can still find segfaults in logs for Linux CI hosts:
jakubgs@linux-02.he-eu-hel1.ci.devel:~ % sudo dmesg -T | grep segfault
[Thu Feb 8 08:56:42 2024] QQmlThread[1332334]: segfault at 15 ip 00007f8b0e492c30 sp 00007f88cbff9680 error 4 in libQt5Qml.so.5[7f8b0e200000+45e000]
[Thu Feb 8 14:47:26 2024] QQmlThread[3500731]: segfault at 15 ip 00007ff409092c30 sp 00007ff1d1ff5680 error 4 in libQt5Qml.so.5[7ff408e00000+45e000]
[Thu Feb 8 15:23:20 2024] QQmlThread[3610479]: segfault at 2c ip 00007fdea1892c30 sp 00007fdc667f6ea0 error 4 in libQt5Qml.so.5[7fdea1600000+45e000]
[Fri Feb 9 07:22:35 2024] QQmlThread[1026495]: segfault at 400000015 ip 00007f8549692c30 sp 00007f83167f6680 error 4 in libQt5Qml.so.5[7f8549400000+45e000]
[Sat Feb 10 18:40:03 2024] QQmlThread[1312983]: segfault at 15 ip 00007ff744692c30 sp 00007ff5117f4680 error 4 in libQt5Qml.so.5[7ff744400000+45e000]
[Mon Feb 12 08:31:02 2024] QQmlThread[3011691]: segfault at 15 ip 00007f7f1b292c30 sp 00007f7cecff3ea0 error 4 in libQt5Qml.so.5[7f7f1b000000+45e000]
@jakubgs did you observe any of such seg faults recently on linux hosts? I think it is all gone after the refactoring of tests but want to double check if that is still happening
Plenty:
jakubgs@linux-01.he-eu-hel1.ci.devel:~ % sudo dmesg | grep segfault | tail
[8242524.181436] QQmlThread[3799208]: segfault at 14 ip 00007fa5a6692c30 sp 00007fa40effc2a0 error 4 in libQt5Qml.so.5.15.2[7fa5a6400000+45e000]
[8418999.075082] QQmlThread[2846267]: segfault at 15 ip 00007f0f79492c30 sp 00007f0de2778e90 error 4 in libQt5Qml.so.5.15.2[7f0f79200000+45e000]
[9399060.430396] QQmlThread[3374280]: segfault at 15 ip 00007f1959892c30 sp 00007f185a7fde90 error 4 in libQt5Qml.so.5.15.2[7f1959600000+45e000]
[9540351.989685] QQmlThread[1606813]: segfault at 2c ip 00007f9ef0892c30 sp 00007f9df97fe2a0 error 4 in libQt5Qml.so.5.15.2[7f9ef0600000+45e000]
[9540357.390422] QQmlThread[1606978]: segfault at 14 ip 00007fada4892c30 sp 00007fac1a7fb2a0 error 4 in libQt5Qml.so.5.15.2[7fada4600000+45e000]
[9658341.995155] QQmlThread[935229]: segfault at 15 ip 00007f5eb9692c30 sp 00007f5c9cff3640 error 4 in libQt5Qml.so.5[7f5eb9400000+45e000]
[10080334.493555] QmlTests[1019106]: segfault at 7fffd82cefe8 ip 00007f0f51c103a7 sp 00007fffd82cefd0 error 6 in libQt5QuickTest.so.5.15.2[7f0f51c00000+1e000]
[10091569.163046] QQmlThread[1728226]: segfault at 200000028 ip 00007f89824c9d93 sp 00007f887f3fde60 error 4 in libQt5Qml.so.5.15.2[7f8982400000+45e000]
[10091573.377684] QQmlThread[1731887]: segfault at 7f515ecd1368 ip 00007f52da2c9d93 sp 00007f5129ffa270 error 4 in libQt5Qml.so.5.15.2[7f52da200000+45e000]
[10167706.124206] QmlTests[1703447]: segfault at 7ffd33b9cff8 ip 00007fceeaa10392 sp 00007ffd33b9d000 error 6 in libQt5QuickTest.so.5.15.2[7fceeaa00000+1e000]
alright, i know what they are, indeed sometimes linux crashes on startup. Our hope is switching to qt 5.15.8
Bug Report
Description
Currently all CI hosts show frequently segmentation faults for two processes:
nim_status_client
andQQmlThread
Steps to reproduce
Unknown, but my best guess is that this is a result of E2E test runs.