mate-desktop / mate-panel

MATE panel
https://mate-desktop.org
GNU General Public License v2.0
185 stars 118 forks source link

wnck-applet crashes randomly since update to 1.26.2 #1385

Open davidhedlund opened 1 year ago

davidhedlund commented 1 year ago

Expected behaviour

Actual behaviour

Screenshot at 2023-07-14 08-31-20

journalctl

jul 17 18:30:06 blues-System-Product-Name wnck-applet[1942]: The program 'wnck-applet' received an X Window System error.
jul 17 18:30:06 blues-System-Product-Name kernel: traps: wnck-applet[1942] trap int3 ip:7ff2ec159167 sp:7ffd00992c60 error:0 in libglib-2.0.so.0.7200.4[7ff2ec115000+8f000]

Steps to reproduce the behaviour

MATE general version

1.26.0-1

Package version

$ apt show libwnck-3-0
Package: libwnck-3-0
Version: 40.1-1
$ apt show mate-panel
Package: mate-panel
Version: 1.26.2-1+11.0trisquel10

Linux Distribution

Trisquel 11

Link to bugreport of your Distribution (requirement)

Closely related: https://gitlab.trisquel.org/trisquel/package-helpers/-/issues/76

davidhedlund commented 1 year ago
davidhedlund commented 1 year ago

Please give me the stack trace command that you want me to execute so I can update the log in the top post.

Jakko3 commented 1 year ago

I'm experiencing this on distribution postmarketOS on device samsung-serranove (armv7, SoC msm8916, display 540x960). Installed package versions: mate-desktop 1.26.1, mate-panel 1.27.2 (but issue also happened at 1.26.3), libwnck3 43.0.

It does not happen on devices bq-paella (aarch64, SoC msm8916, display 720x1280) and virtual machine qemu (amd64, virtual, display 640x480).

On device samsung-serranove I also tried with rotated the display by 90 degrees to 960x540. It didn't solve the issue.

When starting e.g. application "Pluma" from the classic menu, the panel items "Show Desktop", "Window List" and "Workspace Switcher" are crashing, showing message "... has quit unexpectedly". All the three are part of the wnck-applet, which is part of mate-panel (https://github.com/mate-desktop/mate-panel/tree/v1.26.3/applets/wncklet).

Edit: When starting e.g. "Pluma" by command line in a terminal, the crashing doesn't happen.

When removing the "Window List" (right-click its "handle" that is e.g. three dots, uncheck "Lock To Panel", again right-click the three dots, "Remove from Panel") and starting e.g. "Pluma" from classic menu, no crash occurs anymore.

Right-click panel, "Reset All Panels" to continue debugging.

In terminal checking ps -A | grep -i wnck. When starting "Pluma" and the crashes occur, the "/usr/libexec/wnck-applet" doesn't show up anymore, obviously crashed. After clicking "Reload" on the first of the messages, it shows up again in ps -A | grep -i wnck, thus it restarted.

Open a terminal (if needed click 3x "Reload"), type killall wnck-applet, don't touch the 3 error windows, focus the terminal again, type /usr/libexec/wnck-applet (on ssh prepend "DISPLAY=:0"), now click 3x "Reload" on the display (maybe need to minimize the terminal to reach all error windows). Starting "Pluma" from classing menu. The terminal shows message "Illegal instruction".

Summarizing so far: wnck-applet crashes with message "Illegal instruction" when starting an application from classic menu (does not happen with all applications, only with some). "Window List" seems to cause the issue.

I don't know how to debug this further. Any advice?

Jakko3 commented 1 year ago

The issue was also reported for device asus-tf201 (armv7, SoC Nvidia Tegra 3, display 1280x800).

Could it be a 32-bit vs. 64-bit issue somewhere in the code? That's just a wild guess.

LongnoseRob commented 1 year ago

on my device (asus-grouper, Tegra 3 SOC) I attached strace to the PID of wnck-applet (strace -p <PID>), the output was as following when starting firefox (119.0, 32-bit), causing the Show Desktop and WIndowlist crashed messages:

poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{iov_base="\24\0\6\0\24\5\0\0\27\1\0\0\6\0\0\0\0\0\0\0\377\377\377\177", iov_len=24}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 24
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1 H]\1\0\0\0\6\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{iov_base="\24\0\6\0\24\5\0\0\320\1\0\0\0\0\0\0\0\0\0\0\377\377\377\177", iov_len=24}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 24
poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="\1 I]\4\0\0\0\6\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 48
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(5, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0010\0\0\0\202\1\0\0w\0\0\0\1\1o\0\34\0\0\0/org/a11"..., iov_len=136}, {iov_base="\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\6(iiii)\0\0\0\0\0\0\0\0\0"..., iov_len=48}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 184
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=99341216}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=99641222}) = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb59d4000
munmap(0xb59d4000, 8192)                = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=109737410}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=110110417}) = 0
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}], 5, 0) = 0 (Timeout)
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{iov_base="5 \4\0\274\23\200\4\7\0\200\4M\1\30\0\213\4\6\0\275\23\200\4\274\23\200\4%\0\0\0"..., iov_len=1140}, {iov_base=NULL, iov_len=0}, {iov_base="", iov_len=0}], 3) = 1140
recvmsg(3, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="A\0T]\277\23\200\4\3\0\202\0\16\0\200\4\0\20\7\0\0\0\0\0\0\0\0\0\0\0\0\0"..., iov_len=4096}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10353, tv_nsec=115666520}) = 0
recvmsg(3, {msg_namelen=0}, 0)          = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}], 5, 1503) = 0 (Timeout)
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=620595524}) = 0
clock_gettime64(CLOCK_REALTIME, {tv_sec=1698891777, tv_nsec=580876537}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=621849548}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=622561561}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=623294574}) = 0
clock_gettime64(CLOCK_MONOTONIC, {tv_sec=10354, tv_nsec=624337594}) = 0
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPC, si_addr=0xb6f6c962} ---
+++ killed by SIGILL +++

For me this looks like that the mmap fails.. I think we will nee further instructions for debugging, including a debug-build probably..

Jakko3 commented 1 year ago

I see a very similar issue in Xfce4 that also affects my 32-bit device samsung-serranove only but not the 64-bit devices bq-paella and qemu. Not yet fully sure if this is directly related (will investigate further). Still it might be a hint that something underlying causes a 32-bit issue.

cwendling commented 1 year ago

With the very little sample you're giving, could seem like an issue on armv7. Not sure if @davidhedlund has such a system though, and he's seeing X Window System error, which probably is a different issue.

Anyway, you could try and run the applet under GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet and otherwise following @Jakko3's instructions. (as I unfortunately don't see anything relevant in the @LongnoseRob's strace)

Anyway, not that I have any clue here nor amrv7 hardware on which I could test, but maybe that could help track it down.

davidhedlund commented 1 year ago

With the very little sample you're giving, could seem like an issue on armv7. Not sure if @davidhedlund has such a system though, and he's seeing X Window System error, which probably is a different issue.

Intel 64-bit here.

LongnoseRob commented 1 year ago

Tried the propose gdb-session, seem to point to a syscall issue:

$ GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.

 // GDB introduction cut for brevity

Reading symbols from /usr/libexec/wnck-applet...
(No debugging symbols found in /usr/libexec/wnck-applet)
Starting program: /usr/libexec/wnck-applet
[New LWP 29400]
[New LWP 29401]
[New LWP 29402]
[New LWP 29403]
//tigger the problem
[LWP 29402 exited]
^C
Thread 1 "wnck-applet" received signal SIGINT, Interrupt.
__cp_begin () at src/thread/arm/syscall_cp.s:23
23      src/thread/arm/syscall_cp.s: No such file or directory.
(gdb) bt
#0  __cp_begin () at src/thread/arm/syscall_cp.s:23
#1  0xb6fc72c2 in __syscall_cp_c (nr=-1233186104, u=<optimized out>, v=<optimized out>, w=0, x=-1249751712, y=-1237160064, z=-1233863275)
    at src/thread/pthread_cancel.c:33
#2  0x00000000 in  ()
(gdb)                
Jakko3 commented 1 year ago

Intel 64-bit here.

Oh... then it's either not a 32-bit vs. 64-bit issue – or we have two different issues.

@LongnoseRob In your debug output there is a "^C" and at the end of the next line it says "Interrupt". You may have canceled the process? Edit: Start wnck-applet via gdb in terminal, then make it crash by starting Firefox or Pluma from menu.

My debug output isn't much helpful either. The crash is due to "Illegal instruction". However, looking for the backtrace it says the stack may be corrupted. Web-searching about that says this is not unusual on ARM... I didn't find a proper solution how to do the backtracing.

$ GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet
GNU gdb (GDB) 13.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7-alpine-linux-musleabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/libexec/wnck-applet...
(No debugging symbols found in /usr/libexec/wnck-applet)
Starting program: /usr/libexec/wnck-applet 
[New LWP 3972]
[New LWP 3973]
[New LWP 3974]
[New LWP 3975]
[New LWP 3977]
[LWP 3977 exited]
[New LWP 3978]
[New LWP 3979]
[LWP 3978 exited]
[LWP 3979 exited]
[New LWP 3980]
[New LWP 3981]
[LWP 3980 exited]
[LWP 3981 exited]
[New LWP 3983]
[LWP 3974 exited]

Thread 1 "wnck-applet" received signal SIGILL, Illegal instruction.
__stack_chk_fail () at src/env/__stack_chk_fail.c:26
26  src/env/__stack_chk_fail.c: No such file or directory.
(gdb) bt
#0  __stack_chk_fail () at src/env/__stack_chk_fail.c:26
#1  0xb6f5927a in ?? () from /usr/lib/libwnck-3.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
LongnoseRob commented 1 year ago

@LongnoseRob In your debug output there is a "^C" and at the end of the next line it says "Interrupt". You may have canceled the process? Edit: Start wnck-applet via gdb in terminal, then make it crash by starting Firefox or Pluma from menu.

Yes, that was intentional to get the BT of the first process that exited.

I tried some more digging and installed musl-dbg package, as thesese calls to src/env/__stack_chk_fail.c:26 point to musl, but this did not provide any more input..

lukefromdc commented 1 year ago

I am guessing this is ARM only? I do not have any ARM machines bigger than a phone, so no suprise I have never seen this. This shows the value of testing on all supported architectures, as this was found only after release.

Do we have anyone on the team with an ARM laptop? If not, someone developing for ARM would be a valuable addition to the team given such laptops are likely to become more common in the future.

Jakko3 commented 1 year ago

I am guessing this is ARM only?

@LongnoseRob and me are on architecture armv7 or ~armhf~ edit: or likely that's armel, not fully sure (32-bit). @davidhedlund however is on architecture x86-64 or amd64 (64-bit), thus not ARM. Although possibly we have two different issues.

@davidhedlund: Could you try the "gdb" debugging steps and report what you get?

Yes, that was intentional to get the BT of the first process that exited.

The steps I did to get the debug output:

muktupavels commented 1 year ago

If you know how to build from source, you could try libwnck/test-tasklist to see if problem is in libwnck or mate-panel.

lukefromdc commented 1 year ago

Note thatkillall wnck-applet is only possible with the applet built out of process, so that's what you have if this works. Has anyone seen this crash with the applet in-process? You would see the entire panel crashing, any reference in the backtrace to libwnck would imply the same or a similar crash.

Jakko3 commented 1 year ago

I chose a different approach to track it down. In postmarketOS v22.12 (based on Alpine Linux 3.17) it works without issues, in v23.06 (based on 3.18) there are issues. So I set up a pmOS v22.12 (3.17) installation, changed the repository URLs to v23.06 (3.18), systematically upgraded the packages and tried to test which one causes the issue.

It turned out that package "startup-notification" introduced the issue when upgrading in Apline Linux from package version 0.12-r4 to 0.12-r5. The difference between the two build releases "r4" and "r5" is a patch to fix 32-bit time.

The patches (actually the one patch in Alpine consists of two patches) are also implemented in upstream "startup-notification" but are not yet part of an official release, they were implemented after the latest release 0.12.

At least within Alpine Linux that patch seems to do the contrary of what it's supposed to do. I created an issue report at Alpine to discuss the further procedure:


Now the next question is of the issue of @davidhedlund is somehow related to this. It's unlikely as it seems to be a 32-bit issue and he's on 64-bit.

Trisquel is based on Ubuntu (based on Debian). Both in Ubuntu and Debian the version of "startup-notification" wasn't changed for a long time. Also they don't have this 32-bit time patch implemented in their build repository, as far as I can see.

So it's quite clear to me that the issue of @davidhedlund is another one. Sorry for having chimed into your issue. You should try to debug your issue with preferably killall wnck-applet and GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet or alternatively maybe with ps -A | grep wnck-applet and strace -p <PID> where PID is the number from the command before.

davidhedlund commented 1 year ago

I chose a different approach to track it down. In postmarketOS v22.12 (based on Alpine Linux 3.17) it works without issues, in v23.06 (based on 3.18) there are issues. So I set up a pmOS v22.12 (3.17) installation, changed the repository URLs to v23.06 (3.18), systematically upgraded the packages and tried to test which one causes the issue.

It turned out that package "startup-notification" introduced the issue when upgrading in Apline Linux from package version 0.12-r4 to 0.12-r5. The difference between the two build releases "r4" and "r5" is a patch to fix 32-bit time.

* https://gitlab.alpinelinux.org/alpine/aports/-/commit/e99c0e1ab1248b67d25763b493b9a9b8413dc74b

The patches (actually the one patch in Alpine consists of two patches) are also implemented in upstream "startup-notification" but are not yet part of an official release, they were implemented after the latest release 0.12.

* https://gitlab.freedesktop.org/xdg/startup-notification/-/commit/a7e49fefde18ea8d5bada8096d32f23bcfb5a6dc

* https://gitlab.freedesktop.org/xdg/startup-notification/-/commit/ea9f7e4cc6fd8c08d175ed7774ed2c5bd11c8ef0

At least within Alpine Linux that patch seems to do the contrary of what it's supposed to do. I created an issue report at Alpine to discuss the further procedure:

* https://gitlab.alpinelinux.org/alpine/aports/-/issues/15441

Now the next question is of the issue of @davidhedlund is somehow related to this. It's unlikely as it seems to be a 32-bit issue and he's on 64-bit.

Trisquel is based on Ubuntu (based on Debian). Both in Ubuntu and Debian the version of "startup-notification" wasn't changed for a long time. Also they don't have this 32-bit time patch implemented in their build repository, as far as I can see.

* Ubuntu "startup-notification" version: https://packages.ubuntu.com/search?keywords=startup-notification&searchon=names&suite=all&section=all

* Ubuntu "startup-notification" build: https://git.launchpad.net/ubuntu/+source/startup-notification/tree/?h=import/0.12-6build2

* Debian "startup-notification" version: https://tracker.debian.org/pkg/startup-notification

* Debian "startup-notification" build: https://salsa.debian.org/gnome-team/startup-notification/-/tree/debian/0.12-6/debian

So it's quite clear to me that the issue of @davidhedlund is another one. Sorry for having chimed into your issue. You should try to debug your issue with preferably killall wnck-applet and GDK_SYNCHRONIZE=1 gdb -ex run /usr/libexec/wnck-applet or alternatively maybe with ps -A | grep wnck-applet and strace -p <PID> where PID is the number from the command before.

I clicked on "Don't Reload"

Screenshot_trisquel_11 0_amd64 iso_2023-11-06_04:52:16

Screenshot_trisquel_11 0_amd64 iso_2023-11-06_04:53:05

Jakko3 commented 1 year ago

Oh, my bad. The path in Ubuntu is different: /usr/lib/mate-panel/wnck-applet

The order of the first steps are:

In the terminal gdb should now be running without (gdb) prompt.

Try to reproduce the crash that you reported. Or keep on working until the crash happens.

Once wnck-applet crashed, in the terminal there should be some text saying Thread 1 "wnck-applet" received signal[...]. And there is now a (gdb) prompt. Type bt for a backtrace. Exit gdb by tying quit. Copy the text incl. the Thread 1[...] line and paste it here.

If it doesn't work or you have question, feel free to ask.

cwendling commented 1 year ago

@Jakko3 well, done! So now we know what you see is actually an issue in Alpine's libstartup-notifications, not anything else (yet) as this is more than an ABI break a rebuild would solve. In practice, it's (more than likely) crashing libwnck in its sn_startup_sequence_get_last_active_time() call, because it passes long pointers -- which used to be right before Alpine's patch.

The issue here is that the Alpine patch not only breaks ABI (somewhat), but breaks API as well in practice: no rebuild is gonna make the passed in pointers point to wider memory areas. Note that the rationale on the linked email is confusing because it quotes a piece of libstartup-notification code that is not vanilla, and ends up referencing itself so I don't exactly know why they patched this in OpenBSD in the first place… before the patch, callers passing long should have been fine, as the truncation would have happened in sn_startup_sequence_get_last_active_time() itself when e.g. setting *tv_sec, which is not perfect but safe (I think?).

Anyway, if they are going with this change, they also need to patch libwnck to use that new API/ABI:

diff --git a/libwnck/tasklist.c b/libwnck/tasklist.c
index ccc7427..2e79a36 100644
--- a/libwnck/tasklist.c
+++ b/libwnck/tasklist.c
@@ -4973,7 +4973,8 @@ sequence_timeout_callback (void *user_data)
   WnckTasklist *tasklist = user_data;
   GList *tmp;
   gint64 now;
-  long tv_sec, tv_usec;
+  time_t tv_sec;
+  suseconds_t tv_usec;
   double elapsed;

   now = g_get_real_time ();

Note that this is not something upstream libwnck can easily fix, because of the very hard to detect API/ABI break. I guess they could find a very convoluted way to check this, but given there's no release for this since 9 years, I'm not entirely sure they will do… although it might get problematic in 2038.

Anyway, note that xfwm4 also still uses the libsn 0.12 API (e.g. passes pointers to long), so that's gonna be a problem as well if they don't patch it in Alpine. Wanna laugh? xfwm4 actually reverted to passing long because it caused crashes :1st_place_medal: It got introduced in https://gitlab.xfce.org/xfce/xfwm4/-/commit/8448703965b25c1ee97410aa463d789a43a6c771.

Anyway, that's not mate-panel's bug; and I'd say not libwnck one either as there is no released version of the library causing the problem, and checking for this is terribly tricky. Also, it affects multiple (if not all) users of the API, so a reasonable new release of libstartup-notification would need to be incompatible -- or maybe better, add some API to use the 2038-safe values.

cwendling commented 1 year ago

BTW, it's not the first time that libstartup-notifications unexpectedly broke ABI :)

davidhedlund commented 1 year ago

The order of the first steps are:

* open terminal

* `killall wnck-applet`

* don't touch the error windows yet, instead go back to the terminal

* `GDK_SYNCHRONIZE=1 gdb -ex run /usr/lib/mate-panel/wnck-applet`

* now click "Reload" on the error windows (if you have more than one, click "Reload" on all of them)

I had to click reload twice, as you suggested. Also, there's no gdb prompt:

Screenshot_trisquel_11 0_amd64 iso_2023-11-08_14:13:42

x@x-Standard-PC-i440FX-PIIX-1996:~$ killall wnck-applet
x@x-Standard-PC-i440FX-PIIX-1996:~$ GDK_SYNCHRONIZE=1 gdb -ex run /usr/lib/mate-panel/wnck-applet
GNU gdb (Ubuntu 12.1-0ubuntu1~22.04) 12.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/mate-panel/wnck-applet...
(No debugging symbols found in /usr/lib/mate-panel/wnck-applet)
Starting program: /usr/lib/mate-panel/wnck-applet 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff55da640 (LWP 1738)]
[New Thread 0x7ffff4dd9640 (LWP 1739)]
[New Thread 0x7fffeffff640 (LWP 1740)]
[New Thread 0x7fffecf4c640 (LWP 1747)]
[Thread 0x7fffecf4c640 (LWP 1747) exited]
[New Thread 0x7fffecf4c640 (LWP 1748)]
[New Thread 0x7fffe3fff640 (LWP 1749)]
[Thread 0x7fffecf4c640 (LWP 1748) exited]
[New Thread 0x7fffecf4c640 (LWP 1750)]
[New Thread 0x7fffe37fe640 (LWP 1751)]
[Thread 0x7fffe3fff640 (LWP 1749) exited]
[Thread 0x7fffecf4c640 (LWP 1750) exited]
[Thread 0x7fffe37fe640 (LWP 1751) exited]
[New Thread 0x7fffe37fe640 (LWP 1752)]
[Thread 0x7fffeffff640 (LWP 1740) exited]
cwendling commented 1 year ago

@davidhedlund looks good, now you need the applet to crash :slightly_smiling_face:, and then output a backtrace (thread apply all bt full on the GDB prompt).

davidhedlund commented 1 year ago

In the terminal gdb should now be running without (gdb) prompt.

Try to reproduce the crash that you reported. Or keep on working until the crash happens.

Once wnck-applet crashed, in the terminal there should be some text saying Thread 1 "wnck-applet" received signal[...]. And there is now a (gdb) prompt. Type bt for a backtrace. Exit gdb by tying quit. Copy the text incl. the Thread 1[...] line and paste it here.

If it doesn't work or you have question, feel free to ask.

I'm lost here. Should the gdb prompt be triggered or manually opened? @cwendling

Jakko3 commented 1 year ago

Your picture is correct. This is what I described as In the terminal gdb should now be running without (gdb) prompt.

The next step is what I descirbed as Try to reproduce the crash that you reported. Or keep on working until the crash happens. Or @cwendling described it as now you need the applet to crash. By this we mean: You created this issue here titled "wnck-applet crashes randomly since update to 1.26.2". We now need one of those crashes. If you have an idea how to "provoke" that issue, do it. Otherwise, if the issue seems to happen completely random, then use your computer as you usually do until the random crash of wnck-applet happens.

After that crash happened, return to the terminal window and do the next steps as described by me (and slightly extended by @cwendling).

Let us know if you're not sure or it's not clear. That's no problem.

Jakko3 commented 1 year ago

Should the gdb prompt be triggered or manually opened?

When wnck-applet crashes, the gdb prompt will be triggered by the crash.

davidhedlund commented 1 year ago

Should the gdb prompt be triggered or manually opened?

When wnck-applet crashes, the gdb prompt will be triggered by the crash.

If you have virt-manager, you could install Trisquel and try to reproduce it if you want, since you have better knowledge how to solve this issue.

Firefox is not available in the Trisquel repositories due to trademark issues. But Pluma is installed by default.

davidhedlund commented 1 year ago

When starting e.g. application "Pluma" from the classic menu, the panel items "Show Desktop", "Window List" and "Workspace Switcher" are crashing, showing message "... has quit unexpectedly". All the three are part of the wnck-applet, which is part of mate-panel (https://github.com/mate-desktop/mate-panel/tree/v1.26.3/applets/wncklet).

Nothing crashes (with or without killall wnck-applet && GDK_SYNCHRONIZE=1 gdb -ex run /usr/lib/mate-panel/wnck-applet executed first) when I open Pluma from the classic menu in Trisquel.

Jakko3 commented 1 year ago

when I open Pluma from the classic menu in Trisquel

The crashes by starting "Pluma" (or other applications) from classic menu was the issue I ran into. We could find the cause and solve it.

Now we need to find the cause of your issue that you reported. It's a different issue and not related to mine.

I suggest that you set up a terminal like in your last picture – and then use your PC as you usually do. At some point the issue you reported would happen.

If your issue doesn't happen at all anymore, you might close this issue report.

If you have virt-manager, you could install Trisquel and try to reproduce it if you want

I downloaded Trisquel 11 and booted it by live USB on my amd64 PC. I tried different things but did not experience any crashes of wnck-applet so far.

davidhedlund commented 1 year ago

If your issue doesn't happen at all anymore, you might close this issue report.

I experienced this issue two times during the same time (perhaps a month) when I reported it. I have never experienced it ever since, but I have not closed the issue because someone else might be able to reproduce it.

I downloaded Trisquel 11 and booted it by live USB on my amd64 PC. I tried different things but did not experience any crashes of wnck-applet so far.

Thank you very much for your efforts and support!

davidhedlund commented 1 year ago

By coincidence, the wnck-applet crashed again for the first time in 6 months since I opened this issue, Trisquel while I was working with it as normal:

image

So I don't think this issue should be closed.

Same package versions as when I submitted this issue:

$ apt show libwnck-3-0
Package: libwnck-3-0
Version: 40.1-1
$ apt show mate-panel
Package: mate-panel
Version: 1.26.2-1+11.0trisquel10
raveit65 commented 1 year ago

You are using an old mate-panel release. Ubuntu should update to mate-panel-1.26.3 https://github.com/mate-desktop/mate-panel/releases/tag/v1.26.3 Which has fixes for wnck-applets.

Jakko3 commented 1 year ago

Trisquel 11 is based on Ubuntu 22.04 LTS (Jammy Jellyfish). This one is on mate-panel 1.26.2.

Nonetheless a good hint that mate-panel 1.26.3 might solve the issue of @davidhedlund.

Jakko3 commented 1 year ago

I just noticed that the version is 1.26.2-1 +11.0trisquel10. Seems to be a Trisquel build then. I don't know how/where they build their packages, couldn't find it yet on a quick look at https://gitlab.trisquel.org/trisquel.

Ark74 commented 1 year ago

Trisquel takes the latest upstream (ubuntu) source and rebuild using small bash scripts called "helpers" for mate-panel here is the used for the 22.04 jammy based release.

Where,

I wonder what could break if we add 1.26.3 patches to the 1.26.2-1 release on this single package we have from upstream.

Update: ...or better yet only apply the commit(s) fixing the issue.