Open GoogleCodeExporter opened 9 years ago
I forgot something essential:
This is only noticeable when trying to open a terminal. I don't notice this
buggy behavior when doing something else (switching workspaces, focus, shifting
windows etc). That means, everything else works on first press of a hotkey.
It has to be something very special about how Xmonad sends Mod+Shift+Return and
when a terminal application is to be opened.
It does not depend on which Mod key I use (reproducible with Mod1 and Mod4) and
it does not depend on which terminal application I use. It also does not depend
whether a terminal application is in focus or not. Sometimes I use Firefox or
something else and the hotkey for the terminal is lost.
I have really no idea how to diagnose it further.
Original comment by martin.s...@gmail.com
on 8 Aug 2014 at 8:56
Further information:
Key bindings that are not affected:
- Mod+<1...>
- Mod+Shift+<1...>
- Mod+Tab
- Mod+M
- Mod+Return
- Mod+<H,J,K,L>
- ...
Key bindings which are affected by bug above:
- Mod+Shift+Return
- Mod+Shift+Backspace
- Mod+P
- Mod+Shift+P
- ...
Original comment by martin.s...@gmail.com
on 9 Aug 2014 at 3:42
Note that, of the affected keybindings, all but one are "spawn"s — and the
exception isn't even an xmonad keybinding (see
http://www.haskell.org/wikiupload/b/b8/Xmbindings.png and
http://xmonad.org/xmonad-docs/xmonad/src/XMonad-Config.html#line-170) but an X
server internal binding that is usually disabled by default.
Original comment by allber...@gmail.com
on 9 Aug 2014 at 4:16
Mod+Shift+Backspace is my custom binding that starts a shell script:
((modm .|. shiftMask, xK_BackSpace), spawn "~/.xmonad/scripts/shutdown.sh")
This is a good hint, you've given me here. So how to diagnose it further from
here on?
Original comment by martin.s...@gmail.com
on 9 Aug 2014 at 4:39
If you can get a terminal open (or switch to a virtual console), it might be
worth using `truss -f` on the running xmonad process to see if something is
going wrong with `spawn`. You should see xmonad fork, then the child fork again
and exit, and the grandchild exec `sh -c ...`.
This would not actually be the first time we've had problems on FreeBSD; but
the past problems were due to various attempts to improve our process handling
which tripped over certain differences in *BSD's handling of "orphaning" child
processes, and showed up differently — in particular, it could not happen on
the *first* child spawned, but only after hitting the child process limit which
is usually fairly high (in the hundreds at least) these days.
Original comment by allber...@gmail.com
on 9 Aug 2014 at 6:41
One thing I can see from the truss output is that when it does NOT work, the
following happens:
1) fork()
2) child process executing
3) poll(); recvmsg; setitimer(); poll(); SIGALRM
When it works this happens:
1) setitimer(); setitimer();
2) fork()
3) setitimer(); (again)
4) child process executing (SIGALRM happening multiple times within the child
process)
Does it help or do you need the detailed truss output?
Original comment by martin.s...@gmail.com
on 9 Aug 2014 at 8:06
I would like to see the full truss output. You can, however, clean it up a bit
by building your custom xmonad manually (see
http://xmonad.org/xmonad-docs/xmonad/src/XMonad-Core.html#recompile) with the
ghc options:
-rtsopts -with-rtsopts -v0
which will disable the runtime system's GC timer (it will GC on all allocations
instead, which can slow programs down a bit) and remove the itimer and SIGALRM
from the trace.
Original comment by allber...@gmail.com
on 10 Aug 2014 at 12:30
Here are the both xz-compressed truss output files (the one where the xterm
starts is quite hard to produce).
mod-shift-return-ignored.truss.txt -> Key binding did not work
mod-shift-return-ok.truss.txt -> Key binding worked (some noise at the end,
closing the window)
Original comment by martin.s...@gmail.com
on 10 Aug 2014 at 8:09
Attachments:
I'm also on FreeBSD, and while I haven't debugged the issue formally, I can say
that they execute more reliably (haven't seen any issues since) when spawned
inside tcsh.
First I tried something like the following:
I converted:
spawn "dmenu_run"
to
spawn "tcsh -c 'dmenu_run'"
Then, I decided to dig into the definition of spawn, and then I made an
alternate version of spawn instead...
eg like the following:
-- | spawn. Launch an external application. Specifically, it double-forks and
-- runs the 'String' you pass as a command to \/bin\/sh.
--
-- Note this function assumes your locale uses utf8.
spawn' :: MonadIO m => String -> m ()
spawn' x = spawnPIDTCSH x >> return ()
-- | Like 'spawn', but returns the 'ProcessID' of the launched application
spawnPIDTCSH :: MonadIO m => String -> m ProcessID
spawnPIDTCSH x = xfork $ executeFile "/bin/tcsh" False ["-c", encodeString x]
Nothing
Now, when I execute something like
spawn' "dmenu_run"
It works reliably.
Hope this helps, although you might be experiencing a different issue.
Regards,
Tim
Original comment by beyer...@gmail.com
on 18 Aug 2014 at 7:26
Yes. Indeed this workaround using tcsh appears to help. I haven't tested it
very extensively, yet, but the main symptoms are gone.
I still suspect that it is some kind of race condition with timers/signals.
Maybe the startup of /bin/sh is very fast on FreeBSD. Cannot tell for sure
what's going on.
I also compiled with "-rtsopts -with-rtsopts -v0" (I checked ps output during
compilation if it really uses the flags). It did not improve anything and did
not make truss faster (still listing setitimer, alarms etc). I am getting just
about the same output as above.
Btw, FreeBSD upgraded to GHC 7.8.3 a few days ago and the problem persists.
Tim's workaround still helps here.
Original comment by martin.s...@gmail.com
on 19 Aug 2014 at 2:37
I've had this problem for some time now
(https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=181049). Here are some of my
observations so far:
GHC 7.4 is not affected
GHC 7.6 and 7.8 both seem to produce this bug
xmonad 0.8 did not show that bug
xmonad 0.11 did show the bug
As a workaround I copied the definition of spawn and doubleFork from xmonad 0.8
into the xmonad 0.11 source code which then did not seem to exhibit this
problem.
AFAICT it is related to interval timers. interval timers are supposed to be
disabled right before execve(). executeFile does disable the interval timers
but in some cases the call to setitimer does not always take place. Then the
child process is terminated by a SIGALRM.
I am observing the same behaviour even when compiling with -rtsopts
-with-rtsopts -v0.
Sadly I have no idea to fix or debug this. I started looking at the generated
Core for affected/unaffected spawn/doubleFork versions but have no clue yet..
Original comment by dernst...@gmail.com
on 13 Oct 2014 at 7:24
Have you or anyone reported this on the ghc bug tracker?
-v0 and -V0 are different things; the latter should disable itimers completely.
-v is part of the eventlog service and does nothing useful in this case because
0 is not a valid event type.
Original comment by allber...@gmail.com
on 13 Oct 2014 at 7:32
I have done some experiments, too, and I copied the functions involved doing
spawn into a simple program using the IO monad to fork. I could not produce a
problem with this setup, but maybe the race condition leading to this does not
appear there at all, because it is a lightweight program that executes forks in
system a bit faster.
I ask myself if this has something to do with the fact that the spawn runs
inside an X action. Do I understand it right or am I completely wrong? In some
parts Xmonad lacks function type descriptions.
Why is SIGALRM handler installed at all there?
Original comment by martin.s...@gmail.com
on 13 Oct 2014 at 8:04
spawn is handing it upstream to IO via liftIO, as indicated by the MonadIO
constraint; this should be negligible overhead (nanoseconds if it's not
optimized away entirely which it should be).
The periodic itimer and SIGALRM handler are used by GHC's runtime for thread
scheduling, profiler ticks, and IIRC determining when to do a full instead of
partial garbage collection, among other things.
Also, which function type descriptions are missing? I am looking at
http://xmonad.org/xmonad-docs/xmonad/XMonad-Core.html#v:spawn
Original comment by allber...@gmail.com
on 13 Oct 2014 at 8:27
I followed some of the code (I am still a beginner, so I need to look really
long at things and even learn new stuff). I copied some code over in an empty
project and made the effects disappear. This is exactly the same code for
spawn, just cut out of Xmonad. I just wanted to give some further insights, but
I am not sure why this happens. I thought it might help, but it seems it
doesn't. Sorry.
I shouldn't say that Xmonad lacks type descriptions. I wanted to say that for
my eyes there is not enough information to infer what types Xmonad key actions
operate on. I could not figure out, if I am dealing with an X action within the
key press handlers or if is plain IO action (as I said above). Many things are
abstract in this places and as I said, I need to look quite long to understand
in which context the handler operates and I need to learn some concepts of
Haskell that are used in Xmonad that are still new to me.
Original comment by martin.s...@gmail.com
on 13 Oct 2014 at 9:14
OK here's my truss output with -V0. The problem does not occur here anymore.
Also no more calls to setitimer(0,{0.000000, 0.000000 },0x0) right before
execve("/bin/sh", ...) as would be desired in the normal case. But ofc, the
setitimer() call in executeFile only takes place if timers are currently
enabled..
I did not report this problem to GHC yet and no one else has to my knowledge
done that. We first wanted to find out if it's a problem with the FreeBSD port,
but Gabor Pali who maintains this port, couldn't reproduce it on his systems.
Also added another truss log without V0 but with the eventlog enabled
(xmonad.eventlog.truss.bz2). I tried starting a test program which outputs if
the interval timers are still enabled. It worked for PID 2435 and 2437 and did
not work for PID 2432, 2441, 2444. (you can grep for pid nr and execve to find
your way around). I can't see a common theme however. For PID 2432 there were
two threads running before execve aparently . For PID 2441 there was a GC run
between fork and execve...
Do you think that's enough information to report to GHC?
Original comment by dernst...@gmail.com
on 14 Oct 2014 at 7:26
Attachments:
Since SIGALRM is operated by GHC itself when forking processes internally it
would be logical to report directly to the GHC project, I guess. Can you do
this? I think you tried more things than me to look at the problem. Please post
a link to the report here.
I cannot understand how the port maintainer cannot reproduce it. I wonder what
architecture and what customizations he is using, because I would like to have
a system on which I "cannot reproduce it". This is a pretty annoying behavior
on all systems I have (plain GENERIC FreeBSD/amd64; but also in a Virtualbox
environment you can reproduce it).
Original comment by martin.s...@gmail.com
on 24 Oct 2014 at 5:52
I can reproduce this issue under stock GENERIC FreeBSD/amd64 10.0 (running
directly on the hardware, i.e. no virtualization).
Original comment by reaper.t...@gmail.com
on 14 Nov 2014 at 7:56
Confirming the problem on FreeBSD/amd64 10.1-RELEASE.
Original comment by martin.s...@gmail.com
on 14 Jan 2015 at 7:03
I got a bit side-tracked and honestly forgot about this bug... sorry.
Anyhow, Gabor Pali said he might try and talk to a GHC dev. But he suspects it
might be specific to FreeBSD in that some part of the OS/userland could be
implemented slightly differently than to what GHC expects.
If anyone likes can you test the following patch?
It's just adding another forkProcess call which seemed to fix it for me.
Original comment by dernst...@gmail.com
on 20 Jan 2015 at 7:58
Attachments:
This workaround with two chained forkProcess calls also works properly.
Original comment by martin.s...@gmail.com
on 21 Jan 2015 at 8:36
I had exactly the same problem with the same bindings, and I confirm that patch
fixed it.
Original comment by olivier....@gmail.com
on 15 Feb 2015 at 9:04
Original issue reported on code.google.com by
martin.s...@gmail.com
on 8 Aug 2014 at 8:41