Closed biggestsonicfan closed 10 months ago
Is the memory usage constant or does it change over time (increase/decrease noticably)?
Can you get a backtrace of kmozillahelper when it's stuck?
@Vogtinator Took a while for it to happen again. See attached log. kmozillahleper-strace-20220101.log
The issue is most likely this call to accept:
[pid 3207] accept4(288, <unfinished ...>
It doesn't return, which means there is somehow no connection pending. This can have a variety of causes, depending on what the connection is to.
Also, the FD number is unusually high, it might be kmozillahelper not closing some FDs correctly or inheriting many FDs from firefox.
A backtrace (not strace) of the stuck process would show what it's doing, and ls -l /proc/$(pidof kmozillahelper)/fd
would show what the FDs are. With e.g. ss -np | grep 31284
you can find out what socket:[31284]
is connected to.
Oh whoops, I just googled "how to backtrace a linux process" and it took me to a "How to know where a program is stuck in linux?" stackexchange post. I'll generate a proper log next time!
EDIT: This is what I assume I was supposed to do. It also downloaded a lot of debugging packages to cache, but gdb
is attached now. (Oh gdb
also taking 800 Megs of RAM, wew)
EDIT2: I can only assume this was related, but my login session "crashed"? I was just listening to some music and navigating in firefox when my music stopped, my screen went black, then I was at my login screen (normally autologin on boot).
Backtrace log: kmozillahelper-backtrace-20220101.log FD output: kmozillahelper-fd-20220101.log Connected sockets: kmozillahelper-sockets-20220101.log
I hope this is enough.
Here's an odd one. gdb
running and attached during a freeze... I log everything, then quit gdb
... the save file dialog appears after... First time that has happened.
Backtrace log: kmozillahelper-backtrace-20220102.log FD output: kmozillahelper-fd-20220102.log Connected sockets: kmozillahelper-sockets-20220102.log
The error is now occurring about as frequently as it used to in #31, so I'm assuming this might be the same issue, but I was trying to figure out what was going wrong incorrectly. I don't think I ever once closed kmozillahelper by itself prior to this issue and let firefox resume.
It's gone from 1/2000 images per freeze to maybe 1/30 images.
Hm, in this case ss -lnp
might've been required to also list unconnected/listening sockets.
I'll guess that it's the kio slave socket though, which is the only listening socket kmozillahelper
opens here. The only way I can see that accept4
hangs is if the process aborts its connection attempt in between kmozillahelper
getting POLLIN on the socket and calling accept4
on it. Maybe something kills the process just at that point. Is there any kioslave5
process with a kmozillahelper
socket in its arguments running? If yes, can you get backtraces?
If you can reproduce it easier now, you could try strace -fetrace=listen,connect,accept4 -p $(pidof kmozillahelper)
to get more info while the issue happens.
Here's an odd one. gdb running and attached during a freeze... I log everything, then quit gdb... the save file dialog appears after... First time that has happened.
That trace actually shows a properly working process, not stuck in accept4
.
I have 28 kioslave5
processes running right now and I don't even have a save file dialog open. To gdb
each one of them would require more memory than I have installed in my computer. To individually look at the sockets of each one of them, I'm probably going to need to shell script this as doing this manually will not be fun at all.
To clarify, you'd only need to do this while kmozillahelper
is hanging, and only to the one or two kioslaves which have the kmozillahelper socket it its arguments.
Oh I do understand that, but I would have absolutely no idea which kioslave would have them. I am up to 33 kioslave5
processes now.
You could use pgrep -f "kmozillahelper.*slave-socket"
to get their PIDs
You could use
pgrep -f "kmozillahelper.*slave-socket"
to get their PIDs
Without having a filedialog open and kmozillahelper not frozen, of the 35 kioslave5
, the above command produces 25 PIDs.
I assume that most of them stay even after you quit firefox (and kmozillahelper)? If so, those might actually be leftovers from the past times kmozillahelper failed. Getting a backtrace of one or two of the remains might be helpful.
Ah you're right. After I saw your comment just now I closed firefox to find 9 instances of kmozillahelper still running, so I have rebooted so I don't have to play detective at trying to piece which kioslave5 and kmozillahelper went together. On the next hang, I will provide
strace -fetrace=listen,connect,accept4 -p $(pidof kmozillahelper)
ls -l /proc/$(pidof kmozillahelper)/fd
ss -lnp
output of the fd PIDspgrep -f "kmozillahelper.*slave-socket"
Output of strace -fetrace=listen,connect,accept4 -p $(pidof kmozillahelper)
That's something you'd have to run before the hang happens. If that takes too long and the output becomes too much, you could only kepe the last ~1k or so lines before the hang by piping it into |& tail -n 1000
and killing strace
from the outside once the hang happened.
Oh okay, gotcha.
Okay... so....
strace -fetrace=listen,connect,accept4 -p $(pidof kmozillahelper)|& tail -n 1000
kmozillahelper-fetrace-20220105.logls -l /proc/$(pidof kmozillahelper)/fd
kmozillahelper-fd-20220105.logss -lnp
per socket (only one had output)
kmozillahelper-sockets-ss-lnp-20220105.logss -np
per socket
kmozillahelper-sockets-ss-np-20220105.logpgrep -f "kmozillahelper.*slave-socket"
[Note: kio-debuginfo could not be installed for my current version of kio as of the freeze]
kmozillahelper-pgrep-20220105.log
kioslave5-backtrace-20220105.logOkay, it happened again on the same image but with the updated kio and kio-debuginfo installed... however running gdb
on kioslave5
this time resulted in even more dependencies not up to date to have debuginfo packages installed: baloo5-kioslaves baloo5-kioslaves-debuginfo kfilemetadata5 kfilemetadata5-debuginfo libKF5Baloo5 libKF5Baloo5-debuginfo libKF5BalooEngine5 libKF5BalooEngine5-debuginfo liblmdb-0_9_29 liblmdb-0_9_29-debuginfo
Looks a bit like a race condition. A slave is started as PID 24661 (24660 is an intermediate process for daemonizing), and use for a bit (first successfull accept
). At some point later, various slaves are killed (SIGTERM
), but until the five second delay until exit (SIGALRM
), there is another call to accept
.
strace: Process 24660 attached
strace: Process 24661 attached
[pid 24660] +++ exited with 1 +++
[pid 19739] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=24660, si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
[pid 24661] connect(4, {sa_family=AF_UNIX, sun_path="/run/user/1000/kmozillahelpertEAdvg.14.slave-socket"}, 110) = 0
[pid 19739] accept4(24, {sa_family=AF_UNIX}, [110 => 2], SOCK_CLOEXEC) = 22
[pid 24661] connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 24661] connect(5, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = 0
[pid 24655] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 24653] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 24657] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 24659] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 24661] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=19739, si_uid=1000} ---
[pid 19739] accept4(24, <unfinished ...>
[pid 24655] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 24655] +++ killed by SIGALRM +++
[pid 24653] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 24657] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 24659] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
[pid 24661] --- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
but until the five second delay until exit
What exactly do you mean by this?
kmozillahelper spawns ioslave processes for IO, which are killed after 3min of inactivity by sending SIGTERM. The SIGTERM handler sets a flag to exit and installs a 5s alarm to forcibly kill the process if it didn't exit until then. What apparently happens in your case is that after SIGTERM it still tries to connect to kmozillahelper, and before that completes SIGALRM kills the process. That leaves kmozillahelper waiting for that.
Would this happen, say, if I download multiple images per second using the kmozillahelper dialog boxes?
Would this happen, say, if I download multiple images per second using the kmozillahelper dialog boxes?
As long as there was at least 3min of idle time for some slaves, definitely possible. If it's a race condition, it also becomes more likely the more often the triggering action is repeated.
The main question is why the ioslaves are all stuck for more than 5s.
If you can still reproduce it, please try strace -fttketrace=execve -p $(pidof kmozillahelper)
optionally with |& tail -n 1000
again.
Alright, but if the gdb, fd, and socket logs aren't worth it anymore, I'd prefer not to take the time to make those logs anymore.
Alright, but if the gdb, fd, and socket logs aren't worth it anymore, I'd prefer not to take the time to make those logs anymore.
The strace output is sufficient here. With some luck, that should pretty much directly show the cause.
I haven't forgotten about this, but I meant to get back to it sooner than now. Unfortunately, I've made some config changes in Firefox and can no longer get kmozillahelper to hang. Most of the configs I have changed however are related to hardware acceleration and performance. A long time ago, I disabled all this thinking it was causing my system to lock up while I browsed the internet with Firefox and played a game at the same time. The true problem was I was out of RAM an Wine for some reason doesn't crash in that instance.
So I guess I will play with performance settings to try to get it to lock up again, but I just saved probably 1000+ images in a row after reinstalling kmozillahelper and it handled it fine, better than usual.
Got it to crash. strace -fttketrace=execve -p $(pidof kmozillahelper)
was unhelpful:
rob@DESKTOP-KLLQALU:~> strace -fttketrace=execve -p $(pidof kmozillahelper)
strace: Process 11313 attached
15:45:56.386458 +++ killed by SIGABRT (core dumped) +++
rob@DESKTOP-KLLQALU:~> strace -fttketrace=execve -p $(pidof kmozillahelper)
strace: option requires an argument -- 'p'
Try 'strace -h' for more information.
rob@DESKTOP-KLLQALU:~> strace
strace: must have PROG [ARGS] or -p PID
Try 'strace -h' for more information.
rob@DESKTOP-KLLQALU:~> strace -fttketrace=execve -p 11313
strace: attach: ptrace(PTRACE_SEIZE, 11313): Operation not permitted
rob@DESKTOP-KLLQALU:~> sudo strace -fttketrace=execve -p 11313
[sudo] password for root:
strace: attach: ptrace(PTRACE_SEIZE, 11313): Operation not permitted
rob@DESKTOP-KLLQALU:~> sudo strace -fttketrace=execve -p 11313
strace: attach: ptrace(PTRACE_SEIZE, 11313): Operation not permitted
rob@DESKTOP-KLLQALU:~>
That's indeed not helpful. The first crash looks like it's something different entirely. The permission denied could have various reasons (process from a different user, YAMA ptrace_scope)?
Crashed again. This time strace -fttketrace=execve -p $(pidof kmozillahelper)
generated no output, other than stating strace: Process 12574 attached with 5 threads
... what exactly is -fttketrace=execve
. Google shows zero results for "fttketrace" which is quite rare?
Also after force closing the process of Firefox, these were leftover:
That just seems like a lot of memory to me?
EDIT: I see -f
is --follow-forks
, -tt
is --absolute-timestamps=precision:us
, -k
is --stack-traces
, but -e trace
is some kind of evaluation? Should there have been a space between e
and trace
? I also see -b syscall
is --detach-on=syscall
where it's defined as
If specified syscall is reached, detach from traced process. Currently, only execve(2) syscall is supported. This option is useful if you want to trace multi-threaded process and therefore require -f, but don't want to trace its (potentially very complex) children.
Perhaps that's what you wanted?
Crashed again. This time
strace -fttketrace=execve -p $(pidof kmozillahelper)
generated no output, other than statingstrace: Process 12574 attached with 5 threads
...
Note that you have to run the command at least 10 minutes before the issue appears. So you basically have to run it each time you start firefox.
what exactly is
-fttketrace=execve
. Google shows zero results for "fttketrace" which is quite rare?Also after force closing the process of Firefox, these were leftover:
That just seems like a lot of memory to me?
Yep.
EDIT: I see
-f
is--follow-forks
,-tt
is--absolute-timestamps=precision:us
,-k
is--stack-traces
, but-e trace
is some kind of evaluation? Should there have been a space betweene
andtrace
? I also see-b syscall
is--detach-on=syscall
where it's defined as
The long form is -etrace=execve
, i.e. only print execve
syscalls and their stacktrace.
If specified syscall is reached, detach from traced process. Currently, only execve(2) syscall is supported. This option is useful if you want to trace multi-threaded process and therefore require -f, but don't want to trace its (potentially very complex) children.
Perhaps that's what you wanted?
No. It shouldn't hurt, but it shouldn't make a difference either.
Note that you have to run the command at least 10 minutes before the issue appears. So you basically have to run it each time you start firefox.
Ah, you know, I absolutely forgot about that part...
:cold_sweat:
I really don't know what I'm doing wrong to get all these zombie processes and high usage. It hasn't crashed yet but now that there's two processes of it I don't know which one I should be stracing.
Using strace -fttketrace=execve -p $(pidof kmozillahelper) |& tail -n 1000 > kmozillahelper-strace-20220509.log
and kmozillahelper crashed produced a 0 byte file.
I have once again installed kmozillahelper because I feel it's time to try to tackle this again. Will update if anything relevant happens.
I was beginning to think something happened upstream which fixed the issue, as for over 2 weeks solid, I could not reproduce the issue, but a few minutes ago, it happened for the first time since reinstalling kmozillahelper.
I think I am encountering the issue less because I have changed how I rapidly image save files within Firefox. That, however, does not solve the problem at large.
It has been another week, and it's taken this long to crash again. Same symptoms as before, nothing can be done to Firefox except minimize, maximize/restore, or close. What's drawn on the window pane is stuck and you cannot swap tabs. Killing kmozillahelper restores the use of Firefox but at the expense of no file dialogs for the rest of the session.
I can not just run an strace for a week at a time, hoping the same pid will be kmozillahelper by the time of the crash. I am always assuming it's the same race condition from back in January. Is there a mini daemon that can run to see if this happens it can restart kmozillahelper and attach to the already existing Firefox again?
I find it hard to believe I'm the only one in the world at this point who has been experiencing the problem for nearly two years now, unless everyone who uses SUSE either removes kmozillahelper or switches to another browser, but I find that unlikely.
That's a new one... kmozillahelper died after the dialog box appeared. The dialog box could not save files or be navigated, and only the close button at the top would work. Trying to save another file froze firefox completely. It's also the first time I've seen kmozillahelper
with the CPU% of stopped
. drkonqi
also came up when searching for kmozillahelper
so I figure something went terribly wrong.
I am assuming in over a year since it's last update, kmozillahelper isn't actively being looked at to be improved or fix any potential race conditions?
drkonqi should show a (hopefully useful) backtrace.
I am assuming in over a year since it's last update, kmozillahelper isn't actively being looked at to be improved or fix any potential race conditions?
It's most likely a bug in one of the libraries used by kmozillahelper
.
drkonqi never finished, it just sat spinning, I had to force close it. :disappointed:
Is there a daemon I can create like we did with valgrind to automatically capture these errors so I don't have to spend 30 minutes preparing to use firefox each time? I just had another crash at a very, very unfortunate time.
A mozillahelper-helper, as it were...
Another crash, but new behavior. Firefox couldn't recover after killing kmozillahelper. It was unable to be closed but didn't become a zombie either. Tabs couldn't change but the window would redraw correctly when resized. Spamming close, which usually prompts a force close dialog box, resulted in nothing. Had to kill the parent process of firefox from ksysguard.
I think for the moment, once again, I will uninstall and blacklist kmozillahelper. I want to help and diagnose the root of this issue but when there are literally no leads on how to do so it just becomes frustrating as my daily driver.
Brand new SUSE install on brand new drive. Still doing it.
Closing as a won't fix. It's definitely a race condition of saving many consecutive files quickly through the context menu, so I've substituted the context menu for buttons on the images to open the dialog and have had no problems.
This bug has eaten entirely too much of my time and I can't dedicate any more time to trying to figure it out if literally no one in the last two years has encountered it and came here to say "me too".
Interesting update, switching from Nvidia drivers (proprietary) to AMDGPU, I can no longer reproduce this...
Ah yes, my favorite issue has returned, lovely.
Greetings (again),
I've recently reinstalled kmozillahelper and it no longer seems to lock up everything. However, it is now locking up itself. I can recover and continue using Firefox if I kill the kmozillahelper process, but I can no longer use save dialogs unless I completely close Firefox and start a new session.
I'm unsure how memory management should be with kmozillahelper but this seems a little high for what it does?