jun7 / rox-filer

ROX file manager
24 stars 6 forks source link

Crash when closing window of removed folder #196

Open step- opened 5 years ago

step- commented 5 years ago

Hi,

Sometimes rox can crash when the window of a removed folder gets closed. It doesn't happen all the times, but we have at least two different users that can reproduce this issue fairly consistently. Below is one line of commands that can trigger the problem. On my system I need to repeat the commands maybe 10-20 times until rox crashes. I'm attaching a script that loops repeating the line of commands automatically.

xdotool search --sync --onlyvisible --all --pid $(pidof ROX-Filer) --name "^!" getwindowname %@ windowclose %@ & mkdir /tmp/zzz; rox -d /tmp/zzz; rmdir /tmp/zzz
  1. xdotool search --sync: wait for a matching window title to appear
  2. --onlyvisible --all --pid $(pidof ROX-Filer): want a ROX-Filer window
  3. --name "^!": title starts with "!" - that's the title for windows that are open in removed folders
  4. getwindowname %@ - print the window title to the console
  5. windowclose %@ - close the window
  6. & - wait in the background so the next command can run immediately
  7. mkdir /tmp/zzz; rox -d /tmp/zzz; rmdir /tmp/zzz - make folder, open rox in that folder, remove that folder; this will trigger xdotool

This is what gets printed in my terminal when the crash occurs

** (ROX-Filer:15102): WARNING **: 09:55:42.803: The program 'ROX-Filer' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadDrawable (invalid Pixmap or Window parameter)'.
  (Details: serial 5066 error_code 9 request_code 53 minor_code 0)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger.)

** (ROX-Filer:15102): WARNING **: 09:55:42.804: We got a BadWindow error from the X server. This might be due to this GTK bug (during drag-and-drop?):
http://bugzilla.gnome.org/show_bug.cgi?id=152151
Trying to continue...

** (ROX-Filer:15102): WARNING **: 09:55:42.804: The program 'ROX-Filer' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadDrawable (invalid Pixmap or Window parameter)'.
  (Details: serial 5067 error_code 9 request_code 138 minor_code 4)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger.)

** (ROX-Filer:15102): WARNING **: 09:55:42.804: We got a BadWindow error from the X server. This might be due to this GTK bug (during drag-and-drop?):
http://bugzilla.gnome.org/show_bug.cgi?id=152151
Trying to continue...

** (ROX-Filer:15102): WARNING **: 09:55:42.804: The program 'ROX-Filer' received an X Window System error.
This probably reflects a bug in the program.
The error was 'RenderBadPicture (invalid Picture parameter)'.
  (Details: serial 5068 error_code 141 request_code 138 minor_code 26)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger.)

gdb backtrace

(gdb) thread apply all bt

Thread 2 (Thread 0x7f833dc62700 (LWP 26067)):
#0  0x00007f83439ab619 in poll () at /lib64/libc.so.6
#1  0x00007f834498ee47 in g_main_context_poll (priority=<optimized out>, n_fds=2, fds=0x188bcd0, timeout=1452, context=0x1614dd0) at gmain.c:4221
#2  0x00007f834498ee47 in g_main_context_iterate (context=context@entry=0x1614dd0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at gmain.c:3915
#3  0x00007f834498ef36 in g_main_context_iteration (context=0x1614dd0, may_block=may_block@entry=1) at gmain.c:3981
#4  0x00007f834498ef66 in glib_worker_main (data=<optimized out>) at gmain.c:5861
#5  0x00007f83449a9550 in g_thread_proxy (data=0x19a4230) at gthread.c:784
#6  0x00007f834225a5aa in start_thread () at /lib64/libpthread.so.0
#7  0x00007f83439b5c5f in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f83471be900 (LWP 25772)):
#0  0x00007f83438f3e9b in raise () at /lib64/libc.so.6
#1  0x00007f83438f5251 in abort () at /lib64/libc.so.6
#2  0x000000000045235b in rox_x_error (display=0x15859c0, error=0x7ffd5e716c60) at /tmp/rox-filer-master/ROX-Filer/src/main.c:246
#3  0x00007f8343cb7637 in _XError (dpy=dpy@entry=0x15859c0, rep=rep@entry=0x19c2190) at XlibInt.c:1444
#4  0x00007f8343cb5064 in handle_error (dpy=0x15859c0, err=0x19c2190, in_XReply=<optimized out>) at xcb_io.c:199
#5  0x00007f8343cb50f7 in handle_response (dpy=0x15859c0, response=0x19c2190, in_XReply=<optimized out>) at xcb_io.c:324
#6  0x00007f8343cb583a in _XEventsQueued (dpy=dpy@entry=0x15859c0, mode=mode@entry=2) at xcb_io.c:363
#7  0x00007f8343caac1e in XPending (dpy=dpy@entry=0x15859c0) at Pending.c:55
#8  0x00007f83467a5407 in _gdk_events_queue (display=display@entry=0x159b000 [GdkDisplayX11]) at gdkevents-x11.c:2335
#9  0x00007f83467a54df in gdk_event_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at gdkevents-x11.c:2419
#10 0x00007f834498ec75 in g_main_dispatch (context=0x15b8b70) at gmain.c:3182
#11 0x00007f834498ec75 in g_main_context_dispatch (context=context@entry=0x15b8b70) at gmain.c:3847
#12 0x00007f834498ee9d in g_main_context_iterate (context=0x15b8b70, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at gmain.c:3920
#13 0x00007f834498f14a in g_main_loop_run (loop=0x154d810) at gmain.c:4116
#14 0x00007f8346b09a17 in IA__gtk_main () at gtkmain.c:1270
#15 0x0000000000452fef in main (argc=3, argv=0x7ffd5e717248) at /tmp/rox-filer-master/ROX-Filer/src/main.c:661
(gdb)

Maybe this issue is some sort of a race, somewhere inside rox. xdotool doesn't go though the window manager to close the window. It closes the window by brute-force calling XDestroyWindow. In theory this should cause GDK to be notified, and cause it to raise GDK events, which eventually are translated to GTK events, which eventually are passed to ROX; but somewhere in the chain this may be broken (or some sort of race problems happen in ROX trying to update something in the window before all the events have been propagated completely.

Thank you

Attached: make-crash.sh.zip

step- commented 5 years ago

Something more to ponder. I looked at wmctrl, which can close windows by asking the window manager to do it. It seems to be a more polite way of doing it than xdotool's brute-force XDestroyWindow call. So I modified the above script to use wmctrl and I can't make rox crash anymore on my hardware although once in while I can see an Xorg's message complaining about a bad window id. But like I said no crash so far. However, I'm not too confident that with wmctrl an eventual race can be exercised. xdotool alone is fast in detecting the window and immediately closing it. Using wmctrl we go through a slower pipe: xdotool(detect) > xargs > wmctrl > window-manager(close).

Modified script: make-crash-wmctrl.sh.zip

jun7 commented 5 years ago

Hi! thank you for reporting.

Not reproduced though I will check differences between removed dir and normal.

jun7 commented 5 years ago

Does it happen with icons view?

step- commented 5 years ago

It happens with both icons view and list view.

jun7 commented 5 years ago

I checked destroy process but there is no clue to solve it. Looks like the removed dir is only have a error string not so special. Hmm, is there any other conditions?

JakeSFR commented 5 years ago

The 'removed folder' requirement is irrelevant. I was also able to reproduce it with:

xdotool search --sync --onlyvisible --all --pid `pidof ROX-Filer` --name "/tmp" getwindowname %@ windowclose %@ & rox -d /tmp
jun7 commented 5 years ago

So it have not to be on a removed dir, does rox crash always?

JakeSFR commented 5 years ago

Not always, once in a 20-30 tries. Seems to be harder to reproduce than with removed dir, though.

JakeSFR commented 3 years ago

I came across this old issue, tried the wmctrl [EDIT: sorry, the xdotool one] "crasher" again and it still crashes ROX. However, I looked into src/main.c and found this:

https://github.com/jun7/rox-filer/blob/5cce187ae5eaee6eb21a3c06de2e649787f5dd92/ROX-Filer/src/main.c#L212-L245

As you can see, if the error is BadWindow or BadDrawable, ROX just continues. But in Step's output there's also RenderBadPicture (invalid Picture parameter) and I can add BadGC (invalid GC parameter) to the collection.

So, I just replaced abort(); with return 0; and can't reproduce the crash anymore.

@step-: can you add this to the recipe:

sed -i -e 's|abort();|return 0;|' ROX-Filer/src/main.c

rebuild ROX and verify?

step- commented 3 years ago

@step-: can you add this to the recipe:

sed -i -e 's|abort();|return 0;|' ROX-Filer/src/main.c

rebuild ROX and verify?

I will. Give me some time.

step- commented 3 years ago

Patch tested with this command, which repeats 10 times at random intervals opening and closing 15 windows in /tmp

mi=10 mj=15 rox=`pidof ROX-Filer` dir="/tmp"; foo () { xdotool search $1 --onlyvisible --all --pid $rox --name "$dir" getwindowname %@ windowclose %@ >/dev/null & }; eval 'for i in {1..$mi} ; do foo --sync; for j in {1..$mj}; do rox -d "$dir"; done; t=0.1$RANDOM; echo $t; sleep $t; done'; foo

ROX didn't crash therefore I think @JakeSFR's patch is good and valuable for ROX! Thank you. As a side comment, when mi=10 mj=15 above I got this warning message (only once):

** (ROX-Filer:1704): WARNING **: 10:18:02.724: Existing ROX-Filer process is not responding! Try with -n

ROX stopped responding for maybe 3-4 seconds (garbage collecting?), then it came back and the test loop could continue and finish without further warnings. For lower values of mj there were no warnings.

I suspect a memory leak somewhere because htop's RESident size column kept increasing a bit after each run of the above command: ... 30568, 30604, 30640, 30716, 30832, 30956.

JakeSFR commented 3 years ago

Ok, thanks for the confirmation.

I think the message indicates that ROX lost a connection to one of its windows due to X error. With the abort() it would have probably crashed instead. I suspected something like that may happen if we ignore all X errors, but IMHO it's still better than aborting everything (pinboard and all remaining ROX windows). Fortunately, ROX managed to free itself from that freeze; maybe gtk_main_quit() here did it:

https://github.com/jun7/rox-filer/blob/165579d562768836c4c9e65c7fef2f9991a8914e/ROX-Filer/src/remote.c#L1241-L1265

Btw, I remember the last time ROX crashed for me (it still happens, but only once in a few months), it was right after closing a window, so was most likely caused by X error + that abort().

@jun7: I know this is not a real solution and the actual problem might even lie on GTK or X side, but I propose to add this, right after the first g_warning in rox_x_error():

/* Continue on any error */
g_warning(_("Trying to continue..."));
return 0;

If a user doesn't experience crashes, it shouldn't make any difference, but for those who do, it will.

JakeSFR commented 3 years ago

Ok, I made a PR: https://github.com/jun7/rox-filer/pull/224