openSUSE / kmozillahelper

KDE mozilla integration
MIT License
17 stars 9 forks source link

kmozillahelper becomes unresponsive, killing the process does not fix #34

Closed biggestsonicfan closed 10 months ago

biggestsonicfan commented 2 years ago

Greetings (again),

I've recently reinstalled kmozillahelper and it no longer seems to lock up everything. However, it is now locking up itself. I can recover and continue using Firefox if I kill the kmozillahelper process, but I can no longer use save dialogs unless I completely close Firefox and start a new session.

Screenshot_20211214_164443

I'm unsure how memory management should be with kmozillahelper but this seems a little high for what it does?

Vogtinator commented 2 years ago

Is the memory usage constant or does it change over time (increase/decrease noticably)?

Can you get a backtrace of kmozillahelper when it's stuck?

biggestsonicfan commented 2 years ago

@Vogtinator Took a while for it to happen again. See attached log. kmozillahleper-strace-20220101.log

Vogtinator commented 2 years ago

The issue is most likely this call to accept:

[pid  3207] accept4(288,  <unfinished ...>

It doesn't return, which means there is somehow no connection pending. This can have a variety of causes, depending on what the connection is to.

Also, the FD number is unusually high, it might be kmozillahelper not closing some FDs correctly or inheriting many FDs from firefox.

A backtrace (not strace) of the stuck process would show what it's doing, and ls -l /proc/$(pidof kmozillahelper)/fd would show what the FDs are. With e.g. ss -np | grep 31284 you can find out what socket:[31284] is connected to.

biggestsonicfan commented 2 years ago

Oh whoops, I just googled "how to backtrace a linux process" and it took me to a "How to know where a program is stuck in linux?" stackexchange post. I'll generate a proper log next time!

EDIT: This is what I assume I was supposed to do. It also downloaded a lot of debugging packages to cache, but gdb is attached now. (Oh gdb also taking 800 Megs of RAM, wew)

EDIT2: I can only assume this was related, but my login session "crashed"? I was just listening to some music and navigating in firefox when my music stopped, my screen went black, then I was at my login screen (normally autologin on boot).

biggestsonicfan commented 2 years ago

Backtrace log: kmozillahelper-backtrace-20220101.log FD output: kmozillahelper-fd-20220101.log Connected sockets: kmozillahelper-sockets-20220101.log

I hope this is enough.

biggestsonicfan commented 2 years ago

Here's an odd one. gdb running and attached during a freeze... I log everything, then quit gdb... the save file dialog appears after... First time that has happened.

Backtrace log: kmozillahelper-backtrace-20220102.log FD output: kmozillahelper-fd-20220102.log Connected sockets: kmozillahelper-sockets-20220102.log

biggestsonicfan commented 2 years ago

The error is now occurring about as frequently as it used to in #31, so I'm assuming this might be the same issue, but I was trying to figure out what was going wrong incorrectly. I don't think I ever once closed kmozillahelper by itself prior to this issue and let firefox resume.

It's gone from 1/2000 images per freeze to maybe 1/30 images.

Vogtinator commented 2 years ago

Hm, in this case ss -lnp might've been required to also list unconnected/listening sockets.

I'll guess that it's the kio slave socket though, which is the only listening socket kmozillahelper opens here. The only way I can see that accept4 hangs is if the process aborts its connection attempt in between kmozillahelper getting POLLIN on the socket and calling accept4 on it. Maybe something kills the process just at that point. Is there any kioslave5 process with a kmozillahelper socket in its arguments running? If yes, can you get backtraces?

If you can reproduce it easier now, you could try strace -fetrace=listen,connect,accept4 -p $(pidof kmozillahelper) to get more info while the issue happens.

Here's an odd one. gdb running and attached during a freeze... I log everything, then quit gdb... the save file dialog appears after... First time that has happened.

That trace actually shows a properly working process, not stuck in accept4.

biggestsonicfan commented 2 years ago

I have 28 kioslave5 processes running right now and I don't even have a save file dialog open. To gdb each one of them would require more memory than I have installed in my computer. To individually look at the sockets of each one of them, I'm probably going to need to shell script this as doing this manually will not be fun at all.

Vogtinator commented 2 years ago

To clarify, you'd only need to do this while kmozillahelper is hanging, and only to the one or two kioslaves which have the kmozillahelper socket it its arguments.

biggestsonicfan commented 2 years ago

Oh I do understand that, but I would have absolutely no idea which kioslave would have them. I am up to 33 kioslave5 processes now.

Vogtinator commented 2 years ago

You could use pgrep -f "kmozillahelper.*slave-socket" to get their PIDs

biggestsonicfan commented 2 years ago

You could use pgrep -f "kmozillahelper.*slave-socket" to get their PIDs

Without having a filedialog open and kmozillahelper not frozen, of the 35 kioslave5, the above command produces 25 PIDs.

Vogtinator commented 2 years ago

I assume that most of them stay even after you quit firefox (and kmozillahelper)? If so, those might actually be leftovers from the past times kmozillahelper failed. Getting a backtrace of one or two of the remains might be helpful.

biggestsonicfan commented 2 years ago

Ah you're right. After I saw your comment just now I closed firefox to find 9 instances of kmozillahelper still running, so I have rebooted so I don't have to play detective at trying to piece which kioslave5 and kmozillahelper went together. On the next hang, I will provide

Vogtinator commented 2 years ago

Output of strace -fetrace=listen,connect,accept4 -p $(pidof kmozillahelper)

That's something you'd have to run before the hang happens. If that takes too long and the output becomes too much, you could only kepe the last ~1k or so lines before the hang by piping it into |& tail -n 1000 and killing strace from the outside once the hang happened.

biggestsonicfan commented 2 years ago

Oh okay, gotcha.

biggestsonicfan commented 2 years ago

Okay... so....

biggestsonicfan commented 2 years ago

Okay, it happened again on the same image but with the updated kio and kio-debuginfo installed... however running gdb on kioslave5 this time resulted in even more dependencies not up to date to have debuginfo packages installed: baloo5-kioslaves baloo5-kioslaves-debuginfo kfilemetadata5 kfilemetadata5-debuginfo libKF5Baloo5 libKF5Baloo5-debuginfo libKF5BalooEngine5 libKF5BalooEngine5-debuginfo liblmdb-0_9_29 liblmdb-0_9_29-debuginfo

Vogtinator commented 2 years ago

Looks a bit like a race condition. A slave is started as PID 24661 (24660 is an intermediate process for daemonizing), and use for a bit (first successfull accept). At some point later, various slaves are killed (SIGTERM), but until the five second delay until exit (SIGALRM), there is another call to accept.

strace: Process 24660 attached
strace: Process 24661 attached
[pid 24660] +++ exited with 1 +++
[pid 19739] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24660, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
[pid 24661] connect(4, {sa_family=AF_UNIX, sun_path="/run/user/1000/kmozillahelpertEAdvg.14.slave-socket"}, 110) = 0
[pid 19739] accept4(24, {sa_family=AF_UNIX}, [110 => 2], SOCK_CLOEXEC) = 22
[pid 24661] connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 24661] connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 24655] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 24653] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 24657] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 24659] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 24661] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 19739] accept4(24,  <unfinished ...>
[pid 24655] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 24655] +++ killed by SIGALRM +++
[pid 24653] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 24657] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 24659] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 24661] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
biggestsonicfan commented 2 years ago

but until the five second delay until exit

What exactly do you mean by this?

Vogtinator commented 2 years ago

kmozillahelper spawns ioslave processes for IO, which are killed after 3min of inactivity by sending SIGTERM. The SIGTERM handler sets a flag to exit and installs a 5s alarm to forcibly kill the process if it didn't exit until then. What apparently happens in your case is that after SIGTERM it still tries to connect to kmozillahelper, and before that completes SIGALRM kills the process. That leaves kmozillahelper waiting for that.

biggestsonicfan commented 2 years ago

Would this happen, say, if I download multiple images per second using the kmozillahelper dialog boxes?

Vogtinator commented 2 years ago

Would this happen, say, if I download multiple images per second using the kmozillahelper dialog boxes?

As long as there was at least 3min of idle time for some slaves, definitely possible. If it's a race condition, it also becomes more likely the more often the triggering action is repeated.

The main question is why the ioslaves are all stuck for more than 5s.

If you can still reproduce it, please try strace -fttketrace=execve -p $(pidof kmozillahelper) optionally with |& tail -n 1000 again.

biggestsonicfan commented 2 years ago

Alright, but if the gdb, fd, and socket logs aren't worth it anymore, I'd prefer not to take the time to make those logs anymore.

Vogtinator commented 2 years ago

Alright, but if the gdb, fd, and socket logs aren't worth it anymore, I'd prefer not to take the time to make those logs anymore.

The strace output is sufficient here. With some luck, that should pretty much directly show the cause.

biggestsonicfan commented 2 years ago

I haven't forgotten about this, but I meant to get back to it sooner than now. Unfortunately, I've made some config changes in Firefox and can no longer get kmozillahelper to hang. Most of the configs I have changed however are related to hardware acceleration and performance. A long time ago, I disabled all this thinking it was causing my system to lock up while I browsed the internet with Firefox and played a game at the same time. The true problem was I was out of RAM an Wine for some reason doesn't crash in that instance.

So I guess I will play with performance settings to try to get it to lock up again, but I just saved probably 1000+ images in a row after reinstalling kmozillahelper and it handled it fine, better than usual.

biggestsonicfan commented 2 years ago

Got it to crash. strace -fttketrace=execve -p $(pidof kmozillahelper) was unhelpful:

rob@DESKTOP-KLLQALU:~> strace -fttketrace=execve -p $(pidof kmozillahelper)
strace: Process 11313 attached
15:45:56.386458 +++ killed by SIGABRT (core dumped) +++
rob@DESKTOP-KLLQALU:~> strace -fttketrace=execve -p $(pidof kmozillahelper)
strace: option requires an argument -- 'p'
Try 'strace -h' for more information.
rob@DESKTOP-KLLQALU:~> strace
strace: must have PROG [ARGS] or -p PID
Try 'strace -h' for more information.
rob@DESKTOP-KLLQALU:~> strace -fttketrace=execve -p 11313
strace: attach: ptrace(PTRACE_SEIZE, 11313): Operation not permitted
rob@DESKTOP-KLLQALU:~> sudo strace -fttketrace=execve -p 11313
[sudo] password for root: 
strace: attach: ptrace(PTRACE_SEIZE, 11313): Operation not permitted
rob@DESKTOP-KLLQALU:~> sudo strace -fttketrace=execve -p 11313
strace: attach: ptrace(PTRACE_SEIZE, 11313): Operation not permitted
rob@DESKTOP-KLLQALU:~>
Vogtinator commented 2 years ago

That's indeed not helpful. The first crash looks like it's something different entirely. The permission denied could have various reasons (process from a different user, YAMA ptrace_scope)?

biggestsonicfan commented 2 years ago

Crashed again. This time strace -fttketrace=execve -p $(pidof kmozillahelper) generated no output, other than stating strace: Process 12574 attached with 5 threads... what exactly is -fttketrace=execve. Google shows zero results for "fttketrace" which is quite rare?

Also after force closing the process of Firefox, these were leftover: Screenshot_20220419_145811

That just seems like a lot of memory to me?

EDIT: I see -f is --follow-forks, -tt is --absolute-timestamps=precision:us, -k is --stack-traces, but -e trace is some kind of evaluation? Should there have been a space between e and trace? I also see -b syscall is --detach-on=syscall where it's defined as

If specified syscall is reached, detach from traced process. Currently, only execve(2) syscall is supported. This option is useful if you want to trace multi-threaded process and therefore require -f, but don't want to trace its (potentially very complex) children.

Perhaps that's what you wanted?

Vogtinator commented 2 years ago

Crashed again. This time strace -fttketrace=execve -p $(pidof kmozillahelper) generated no output, other than stating strace: Process 12574 attached with 5 threads...

Note that you have to run the command at least 10 minutes before the issue appears. So you basically have to run it each time you start firefox.

what exactly is -fttketrace=execve. Google shows zero results for "fttketrace" which is quite rare?

Also after force closing the process of Firefox, these were leftover: Screenshot_20220419_145811

That just seems like a lot of memory to me?

Yep.

EDIT: I see -f is --follow-forks, -tt is --absolute-timestamps=precision:us, -k is --stack-traces, but -e trace is some kind of evaluation? Should there have been a space between e and trace? I also see -b syscall is --detach-on=syscall where it's defined as

The long form is -etrace=execve, i.e. only print execve syscalls and their stacktrace.

If specified syscall is reached, detach from traced process. Currently, only execve(2) syscall is supported. This option is useful if you want to trace multi-threaded process and therefore require -f, but don't want to trace its (potentially very complex) children.

Perhaps that's what you wanted?

No. It shouldn't hurt, but it shouldn't make a difference either.

biggestsonicfan commented 2 years ago

Note that you have to run the command at least 10 minutes before the issue appears. So you basically have to run it each time you start firefox.

Ah, you know, I absolutely forgot about that part...

biggestsonicfan commented 2 years ago

:cold_sweat: Screenshot_20220422_093859

I really don't know what I'm doing wrong to get all these zombie processes and high usage. It hasn't crashed yet but now that there's two processes of it I don't know which one I should be stracing.

biggestsonicfan commented 2 years ago

Using strace -fttketrace=execve -p $(pidof kmozillahelper) |& tail -n 1000 > kmozillahelper-strace-20220509.log and kmozillahelper crashed produced a 0 byte file.

biggestsonicfan commented 2 years ago

I have once again installed kmozillahelper because I feel it's time to try to tackle this again. Will update if anything relevant happens.

biggestsonicfan commented 2 years ago

I was beginning to think something happened upstream which fixed the issue, as for over 2 weeks solid, I could not reproduce the issue, but a few minutes ago, it happened for the first time since reinstalling kmozillahelper.

I think I am encountering the issue less because I have changed how I rapidly image save files within Firefox. That, however, does not solve the problem at large.

biggestsonicfan commented 2 years ago

It has been another week, and it's taken this long to crash again. Same symptoms as before, nothing can be done to Firefox except minimize, maximize/restore, or close. What's drawn on the window pane is stuck and you cannot swap tabs. Killing kmozillahelper restores the use of Firefox but at the expense of no file dialogs for the rest of the session.

I can not just run an strace for a week at a time, hoping the same pid will be kmozillahelper by the time of the crash. I am always assuming it's the same race condition from back in January. Is there a mini daemon that can run to see if this happens it can restart kmozillahelper and attach to the already existing Firefox again?

I find it hard to believe I'm the only one in the world at this point who has been experiencing the problem for nearly two years now, unless everyone who uses SUSE either removes kmozillahelper or switches to another browser, but I find that unlikely.

biggestsonicfan commented 2 years ago

That's a new one... kmozillahelper died after the dialog box appeared. The dialog box could not save files or be navigated, and only the close button at the top would work. Trying to save another file froze firefox completely. It's also the first time I've seen kmozillahelper with the CPU% of stopped. drkonqi also came up when searching for kmozillahelper so I figure something went terribly wrong.

Screenshot_20220918_132855

I am assuming in over a year since it's last update, kmozillahelper isn't actively being looked at to be improved or fix any potential race conditions?

Vogtinator commented 2 years ago

drkonqi should show a (hopefully useful) backtrace.

I am assuming in over a year since it's last update, kmozillahelper isn't actively being looked at to be improved or fix any potential race conditions?

It's most likely a bug in one of the libraries used by kmozillahelper.

biggestsonicfan commented 2 years ago

drkonqi never finished, it just sat spinning, I had to force close it. :disappointed:

biggestsonicfan commented 2 years ago

Is there a daemon I can create like we did with valgrind to automatically capture these errors so I don't have to spend 30 minutes preparing to use firefox each time? I just had another crash at a very, very unfortunate time.

A mozillahelper-helper, as it were...

biggestsonicfan commented 2 years ago

Another crash, but new behavior. Firefox couldn't recover after killing kmozillahelper. It was unable to be closed but didn't become a zombie either. Tabs couldn't change but the window would redraw correctly when resized. Spamming close, which usually prompts a force close dialog box, resulted in nothing. Had to kill the parent process of firefox from ksysguard.

I think for the moment, once again, I will uninstall and blacklist kmozillahelper. I want to help and diagnose the root of this issue but when there are literally no leads on how to do so it just becomes frustrating as my daily driver.

biggestsonicfan commented 2 years ago

Brand new SUSE install on brand new drive. Still doing it.

biggestsonicfan commented 1 year ago

Closing as a won't fix. It's definitely a race condition of saving many consecutive files quickly through the context menu, so I've substituted the context menu for buttons on the images to open the dialog and have had no problems.

This bug has eaten entirely too much of my time and I can't dedicate any more time to trying to figure it out if literally no one in the last two years has encountered it and came here to say "me too".

biggestsonicfan commented 1 year ago

Interesting update, switching from Nvidia drivers (proprietary) to AMDGPU, I can no longer reproduce this...

biggestsonicfan commented 12 months ago

Ah yes, my favorite issue has returned, lovely.