raboof / notion

Tiling tabbed window manager
https://notionwm.net/
GNU Lesser General Public License v2.1
268 stars 63 forks source link

tiling->unsplit from the context menu crashes notion #334

Closed wilhelmy closed 1 year ago

wilhelmy commented 3 years ago
Apr 14 19:15:11 fukushima systemd-coredump[2607]: Process 1905 (notion) of user 1000 dumped core.

                                                  Stack trace of thread 1905:
                                                  #0  0x000000000043c9d7 lookup_dynfun (notion + 0x3c9d7)
                                                  #1  0x000000000042465e region_rescue_clientwins (notion + 0x2465e)
                                                  #2  0x0000000000424c6d region_rescue_needed (notion + 0x24c6d)
                                                  #3  0x00007f3dbc8da150 mod_tiling_untile (mod_tiling.so + 0x12150)
                                                  #4  0x00007f3dbc8da21c l2chnd_b_o__WTiling (mod_tiling.so + 0x1221c)
                                                  #5  0x000000000043aa2a extl_l1_call_handler2 (notion + 0x3aa2a)
                                                  #6  0x00007f3dbcccd573 n/a (liblua5.4.so.5 + 0xf573)
                                                  #7  0x00007f3dbccce129 n/a (liblua5.4.so.5 + 0x10129)
                                                  #8  0x00007f3dbcccb6cb n/a (liblua5.4.so.5 + 0xd6cb)
                                                  #9  0x00007f3dbccd039e n/a (liblua5.4.so.5 + 0x1239e)
                                                  #10 0x00007f3dbccd0486 lua_pcallk (liblua5.4.so.5 + 0x12486)
                                                  #11 0x000000000043b505 extl_l1_call_handler (notion + 0x3b505)
                                                  #12 0x00007f3dbcccd573 n/a (liblua5.4.so.5 + 0xf573)
                                                  #13 0x00007f3dbccdceb9 n/a (liblua5.4.so.5 + 0x1eeb9)
                                                  #14 0x00007f3dbccce142 n/a (liblua5.4.so.5 + 0x10142)
                                                  #15 0x00007f3dbcccb6cb n/a (liblua5.4.so.5 + 0xd6cb)
                                                  #16 0x00007f3dbccd039e n/a (liblua5.4.so.5 + 0x1239e)
                                                  #17 0x00007f3dbccd0486 lua_pcallk (liblua5.4.so.5 + 0x12486)
                                                  #18 0x000000000043af8c extl_dodo_call_vararg (notion + 0x3af8c)
                                                  #19 0x000000000043a17c extl_docpcall (notion + 0x3a17c)
                                                  #20 0x00007f3dbcccd573 n/a (liblua5.4.so.5 + 0xf573)
                                                  #21 0x00007f3dbccce129 n/a (liblua5.4.so.5 + 0x10129)
                                                  #22 0x00007f3dbcccb6cb n/a (liblua5.4.so.5 + 0xd6cb)
                                                  #23 0x00007f3dbccd039e n/a (liblua5.4.so.5 + 0x1239e)
                                                  #24 0x00007f3dbccd0486 lua_pcallk (liblua5.4.so.5 + 0x12486)
                                                  #25 0x0000000000439341 extl_cpcall (notion + 0x39341)
                                                  #26 0x000000000043bc81 extl_cpcall_call (notion + 0x3bc81)
                                                  #27 0x000000000043bd66 extl_call (notion + 0x3bd66)
                                                  #28 0x00007f3dbc8fffbf menu_do_finish (mod_menu.so + 0x4fbf)
                                                  #29 0x000000000043765e do_execute (notion + 0x3765e)
                                                  #30 0x000000000041a200 ioncore_mainloop (notion + 0x1a200)
                                                  #31 0x00000000004183dd main (notion + 0x183dd)
                                                  #32 0x00007f3dbc9cab25 __libc_start_main (libc.so.6 + 0x27b25)
                                                  #33 0x000000000041850e _start (notion + 0x1850e)

Edit: can anyone reproduce?

knixeur commented 3 years ago

I didn't catch the stacktrace but I confirm it crashed :(

Btw: the menuitem is called Untile

kristopolous commented 1 year ago

This is super reproducible and I've been looking into it for a few days (I'm pretty lazy). The "bug" is in ops.c

FOR_ALL_MANAGED_BY_TILING(reg, tiling, tmp)

That for loop gets corrupted by this line:

reg2=group_do_attach(grp, &param, &data); 

Which frees the tiling->managed_list object from underneath it and thus corrupts the pointers.

Here's a really reproducible version.

  1. do rm -fr ~/.notion
  2. start notion in the default tiled mode with 1 application window (such as xterm)
  3. right click on the title bar, go to "Tiling" => "Untile"
  4. enjoy your core.

Somehow the reparenting has to be done without the rug pull here. It has been there since at least 2014. I didn't follow the full path down though. I could bisect it if we really care.

ops.c may not be the fix point, it's just the first common parent between the defect and crash.

Here's some more details:

(gdb) b ops.c:135
(gdb) c
(gdb) watch tiling->managed_list
(gdb) c

Notice how you'll get something like this:

#0  0x000055cd3c8e4e0a in free_node (ptrlist=0x55cd3dfdb2d0, node=0x55cd3e055210) at ptrlist.c:19
#1  0x000055cd3c8e53a0 in ptrlist_remove (ptrlist=0x55cd3dfdb2d0, ptr=0x55cd3df85180) at ptrlist.c:121
#2  0x00007f2656a257e1 in tiling_do_managed_remove (ws=0x55cd3dfdb210, reg=0x55cd3df85180) at tiling.c:673
#3  0x00007f2656a258f8 in tiling_managed_remove (ws=0x55cd3dfdb210, reg=0x55cd3df85180) at tiling.c:697
#4  0x000055cd3c8b33fe in region_managed_remove (mgr=0x55cd3dfdb210, reg=0x55cd3df85180) at region.c:234
#5  0x000055cd3c8b3b39 in region_detach_manager (reg=0x55cd3df85180) at region.c:576
#6  0x000055cd3c8b5641 in doit_reparent (mgr=0x55cd3e056ba0, par=0x55cd3df455a0, fp=0x7ffe1f188240, cont=0x55cd3c8ce580 <group_do_attach_final>, cont_param=0x7ffe1f1882b0, reg=0x55cd3df85180) at attach.c:86
#7  0x000055cd3c8b58f2 in region_attach_helper (mgr=0x55cd3e056ba0, par=0x55cd3df455a0, fp=0x7ffe1f188240, fn=0x55cd3c8ce580 <group_do_attach_final>, fn_param=0x7ffe1f1882b0, data=0x7ffe1f188290) at attach.c:171
#8  0x000055cd3c8ce96f in group_do_attach (ws=0x55cd3e056ba0, param=0x7ffe1f1882b0, data=0x7ffe1f188290) at group.c:729
#9  0x00007f2656a31cfb in mod_tiling_untile (tiling=0x55cd3dfdb210) at ops.c:148

That's where your memory corruption occurs. I don't know enough about the mechanics here. Maybe a simple copy of the ptr list and then a freeing at the end of the loop would do it?

kristopolous commented 1 year ago

back. did the git bisect. defect came in at 8d3f26280c0e3778e48ec33409c884d3f44c99d8, it's the rug pull as theorized. It's in tiling_managed_remove right at splittree_remove, author is @raboof .

I'll look into it more.

raboof commented 1 year ago

Ouch, sorry about that. Before https://github.com/raboof/notion/commit/8d3f26280c0e3778e48ec33409c884d3f44c99d8 that splittree_remove was already there, but perhaps triggered under different/fewer conditions. That if(!reused) that got removed looks potentially relevant I guess...

I'll look into it more

Thanks!

kristopolous commented 1 year ago

it's calling ptrlist_remove(&(ws->managed_list), reg); in tiling_do_managed_remove which we identified from above as the issue here.

Here's a really cheap solution

a/mod_tiling/ops.c b/mod_tiling/ops.c
index 0bd21912..bc5059d9 100644
--- a/mod_tiling/ops.c
+++ b/mod_tiling/ops.c
@@ -147,13 +147,20 @@ bool mod_tiling_untile(WTiling *tiling)

         reg2=group_do_attach(grp, &param, &data);

+        // See #334: tiling->unsplit from the context menu crashes notion
+        if(tiling->managed_list == NULL) {
+            break;
+        }
+
         if(reg2==NULL)
             warn(TR("Unable to move a region from tiling to group."));
     }

     tiling->batchop=FALSE;

-    region_dispose((WRegion*)tiling);
+    if(tiling->managed_list != NULL) {
+        region_dispose((WRegion*)tiling);
+    }

     return TRUE;
 }

we can probably do better than that

kristopolous commented 1 year ago

I'm also open to just using this and then opening up a more minor memory leak bug which I'm sure is sitting around somewhere in this hack and then just move on.

raboof commented 1 year ago

I merged #357 - do you want to keep this issue open to keep looking for a clearer solution or rather close it?

kristopolous commented 1 year ago

no ... let me handle that

kristopolous commented 1 year ago

apparently only you have the privilege of closing. Feel free to do that now that we have documented and opened up the more major and now less visible issue.