jparris / wmii

Automatically exported from code.google.com/p/wmii
MIT License
0 stars 0 forks source link

crash in libixp code while working in the 'mount -t 9p' fs #150

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. modprobe 9p
2. wmii -a $HOME/.wmii/socket -r $HOME/.wmii/rc, where $HOME/.wmii/rc contains:
    #!/bin/dash
    sudo mount -t 9p socket fs -o trans=unix,uname=$USER,noextend,dfltgid=$(id -g),dfltuid=$(id -u)

3. cat fs/event

Program received signal SIGSEGV, Segmentation fault.
0x000000000042c58b in ixp_pending_clunk (req=0x678100) at srv_util.c:328
328     pend_link->next->prev = pend_link->prev;
(gdb) backtrace
#0  0x000000000042c58b in ixp_pending_clunk (req=0x678100) at srv_util.c:328
#1  0x0000000000413fe7 in fs_clunk (r=0x678100) at fs.c:675
#2  0x000000000042a642 in handlereq (r=0x678100) at request.c:189
#3  0x000000000042a429 in handlefcall (c=0x66d2f0) at request.c:139
#4  0x000000000042b9cf in handle_conns (s=0x6495e0) at server.c:113
#5  0x000000000042bb37 in ixp_serverloop (s=0x6495e0) at server.c:161
#6  0x0000000000417dfd in main (argc=0, argv=0x7fffffffe2e0) at main.c:433

4. echo 'grabmod Mod4' > fs/ctl

Program received signal SIGSEGV, Segmentation fault.
0x0000000000000000 in ?? ()
(gdb) backtrace
#0  0x0000000000000000 in ?? ()
#1  0x000000000042add0 in handlereq (r=0x678100) at request.c:340
#2  0x000000000042a429 in handlefcall (c=0x66d2f0) at request.c:139
#3  0x000000000042b9cf in handle_conns (s=0x6495e0) at server.c:113
#4  0x000000000042bb37 in ixp_serverloop (s=0x6495e0) at server.c:161
#5  0x0000000000417dfd in main (argc=0, argv=0x7fffffffe2e0) at main.c:433

What version of the product are you using (wmii -v)? On what operating
system?
wmii-hg2579, Linux 2.6.31.5, x86_64

Original issue reported on code.google.com by rogu...@gmail.com on 29 Oct 2009 at 8:37

GoogleCodeExporter commented 9 years ago
Oh, and it should be -a unix\!$HOME/.wmii/socket in step 2.

Original comment by rogu...@gmail.com on 29 Oct 2009 at 8:39

GoogleCodeExporter commented 9 years ago
Hm.. And libixp is:
$ hg identify -n
113

Original comment by rogu...@gmail.com on 29 Oct 2009 at 8:41

GoogleCodeExporter commented 9 years ago
Can you run wmii under valgrind and report what output it procudes?

    valgrind --tool=memcheck --num-callers=12 wmii

Thanks.

Original comment by maglion...@gmail.com on 29 Oct 2009 at 9:07

GoogleCodeExporter commented 9 years ago
'action_quit' contains the log which was generated after 'echo quit > ctl' (the 
same to be generated for any write).
'cat_event' was generated after 'cat event<CR><C-c>' (this one didn't crash 
under valgrind).

Original comment by rogu...@gmail.com on 29 Oct 2009 at 11:27

Attachments:

GoogleCodeExporter commented 9 years ago
That looks like an optimized libixp in action_quit, or otherwise with incorrect
debugging data. Can you try again with a clean debugging build?

Thanks.

Original comment by maglion...@gmail.com on 29 Oct 2009 at 11:51

GoogleCodeExporter commented 9 years ago
This issue was updated by revision 8dde838d46.

The second part of this issue shoule be resolved.

Original comment by maglion...@gmail.com on 30 Oct 2009 at 12:03

GoogleCodeExporter commented 9 years ago
I should note that the revision referenced above refers to the libixp repo, but
unfortunately, Google's revision linkification algorithm is a bit flawed, even
despite the fact that it generated the entry itself.

Original comment by maglion...@gmail.com on 30 Oct 2009 at 12:17

GoogleCodeExporter commented 9 years ago
Also, sorry for the noise, but please make sure that wmii is linked against the 
most
current hg version of libixp. I fixed a problem very similar to the first part 
of
this issue recently, and I can't reproduce the problem myself.

Original comment by maglion...@gmail.com on 30 Oct 2009 at 12:22

GoogleCodeExporter commented 9 years ago
Yes, I was blindly trying to add some debug flags and the build could've been 
mangled indeed, sorry (I didn't notice that the 
debug flags were already defined in config.mk, via gcc.mk).

I've just updated from the googlecode repos and I'm sure I'm linking against it.

With the 'Make srv->wstat optional.' commit to libixp, 'cat fs/event' produces 
very similar valgrind output and 'echo quit > fs/ctl' 
stops working with the following error: 'zsh: unknown error 526: fs/ctl' (an 
exit code =526?). So I've reverted the commit and 
attached the more informative log for 'echo quit > fs/ctl'.

(if it would help, I'm on FreeNode/OFTC as rogutes)

Original comment by rogu...@gmail.com on 30 Oct 2009 at 1:01

Attachments:

GoogleCodeExporter commented 9 years ago
Its getting noisy here... Anyway, 'action_quit_2' was run without the 'Make 
srv->wstat optional.' commit to libixp, but with  '--leak-
check=full' option to valgrind, sorry. The log is basically the same as 
'action_log' without the option.

Original comment by rogu...@gmail.com on 30 Oct 2009 at 1:10

GoogleCodeExporter commented 9 years ago
There's no need to revert the commit. It's correct. echo >ctl generates a 
truncate
request which returns an error, rather than crashing, because wmii doesn't 
accept
them. Use >>ctl.

Incidentally, the log still appears wrong. There should be another frame at the 
top
of the stack there. It looks rather like an optimized tail call, which should't
happen in a debugging build. The GCC stack trace is correct, though.

Don't worry about the leak check. That burried in the XLib code somewhere. It's
irrelevant anyway, with everyone switching to xlib-xcb which doesn't have the 
problem.

Can you give me any other information regarding the first problem? Which wmiirc 
are
you running, what alterations, what commands do you run first?

Also, uname -a would be nice.

Original comment by maglion...@gmail.com on 30 Oct 2009 at 2:05

GoogleCodeExporter commented 9 years ago
Yes, the crash when trying to truncate is fixed (I guess I'm used to writing 
echo 1>some_path_under_sysfs) and the valgrind logs look clean.

As for the crash while reading 'event', I I'll try to be very verbose, too 
verbose:
hg clone http://wmii.googlecode.com/hg/ wmii
cp -r wmii wmii-build; cd wmii-build
hg clone http://libixp.googlecode.com/hg/ libixp
sed -i "s#PREFIX =.*#PREFIX = $PWD#" config.mk libixp/config.mk
cd libixp
make install
cd ..
make
cd cmd/wmii
mkdir fs
echo '#!/bin/dash' > rc; chmod +x rc
Xnest -geometry 1280x600 :1 &
export DISPLAY=:1
urxvt -cd $PWD &
valgrind --log-file=cat_event --tool=memcheck --num-callers=12 ./wmii.out -a 
unix\!$PWD/socket -r $PWD/rc
[[ focus urxvt in the Xnest window ]]
sudo umount fs; sudo mount -t 9p socket fs -o 
trans=unix,uname=u,noextend,dfltgid=1000,dfltuid=1000
cat fs/event
[[ <Control-c> ]]
[[ crashes when not under valgrind; 'cat_event' attached - it is susipiciously 
similar to the previous one ]]
[[  a similar log is produced when running with the following rc file
    #!/bin/dash
    wmiir -a unix\!socket read /event
]]
echo 'Start rc' >> fs/event
echo quit >> fs/ctl
[[ crashes when not under valgrind; 'wmiir_cat_event' attached ]]

My system is Arch Linux x86_64, with their 2.6.31.5 kernel patches (which are 
basically upstream + aufs2) and a custom config. 'uname -a' is less informative 
than this:
  Linux r 2.6.31_31 #1 SMP Thu Oct 29 16:57:25 EET 2009 x86_64 Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux

Distribution's binaries are stripped and compiled with:
CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe"
LDFLAGS="-Wl,--hash-style=gnu -Wl,--as-needed" 
Some of my packages, especially xorg, is compiled with
CFLAGS="-march=native -O2 -pipe"

Thanks for helping me try wmii.

Original comment by rogu...@gmail.com on 30 Oct 2009 at 12:31

Attachments:

GoogleCodeExporter commented 9 years ago
Have you tried the Arch package, or an AUR build? I have nearly the exact same 
setup
as you, only kernel 3.6.30 and an AMD processor, and those exact steps 
(substituting
Xephyr for Xnest) work fine for me.

Well, let's try this. I'll attach an Arch package and the PKGBUILD I used to 
make it.
You can try the package, and use the PKGBUILD to send me back a package to test 
here.

You also might want to use config.local.mk rather than editing config.mk, too. 
It
makes it a lot easier to work from hg.

Original comment by maglion...@gmail.com on 30 Oct 2009 at 3:06

Attachments:

GoogleCodeExporter commented 9 years ago
I've just reproduced this on another machine, a basically uncustomized 32bit 
Arch (latest -ARCH kernel etc.). This is similar to what I've run there, 
following my steps above:
[[ make, run in Xephyr, mount 9p ]]
cat fs/ctl
echo 'grabmod Mod4' >> fs/ctl
cat fs/event
[[ <C-c> ]]
cat fs/event &
echo 'Start rc' >> fs/event
echo 'Start rc' >> fs/event
echo 'Start rc1' >> fs/event
echo 'quit' >> fs/ctl
[[ and valgrind made the PC choke, swapping ]]

The wmii binary from the package you've attached segfaulted as well. And I'm 
not sure why we are talking about PKGBUILD's here - makepkg doesn't do any 
isolation, so 'manually' compiled wmii binary 
should be enough for testing, no? I've attached the standalone wmii binaries 
and the log were valgrind runs out of memory.

Btw., I've tried extra/wmii 3.6 and it worked.

Original comment by rogu...@gmail.com on 30 Oct 2009 at 4:59

Attachments:

GoogleCodeExporter commented 9 years ago
There are actually a few reasons to use a package. 1) it includes support files,
namely wmiir. 2) makepkg sets standard build variables, and makepkg builds on 
Arch
are known to work.

Anyway, since your binary works for me and mine doesn't work for you, I'm going 
to
need more debugging info. Please update libixp to tip and apply the attached 
patch to
wmii, and then run:

    wmii -r true -D 9p 2>&1 | tee 9p.log

and then run wmiir read /event, along with a separate log from a 9p.ko mount; 
cat /event

Thanks.

Original comment by maglion...@gmail.com on 30 Oct 2009 at 9:58

Attachments:

GoogleCodeExporter commented 9 years ago
wmiir crashes in print.c, after this line:
fcall = va_arg(f->args, Fcall*);

Original comment by rogu...@gmail.com on 31 Oct 2009 at 12:52

GoogleCodeExporter commented 9 years ago
i.e. in the following lines that access the fcall structure

Original comment by rogu...@gmail.com on 31 Oct 2009 at 12:57

GoogleCodeExporter commented 9 years ago
Sorry, I'm not sure why it worked for me. A fixed version is attached.

Original comment by maglion...@gmail.com on 31 Oct 2009 at 2:05

Attachments:

GoogleCodeExporter commented 9 years ago
Logs attached (segfault after 'cat event'). Hope they help.

P.S. Is wmiir more efficient than the mounted 9p fs?

Original comment by rogu...@gmail.com on 31 Oct 2009 at 12:37

Attachments:

GoogleCodeExporter commented 9 years ago
Ok, thanks, that's what I needed.

As for efficiency, it's more or less a wash. 9p.ko definitely has some 
advantages, in
that it's in kernel space and maintains a constant connection. wmiir, on the 
other
hand, is lightweight and very direct in its operations. At any rate, I tend to 
use
wmiir on the commandline and pyxp for my scripts (although you could just mount 
via
9p.ko and use standard IO functions instead).

Original comment by maglion...@gmail.com on 31 Oct 2009 at 2:07

GoogleCodeExporter commented 9 years ago
Ok, I've attached a patch to libixp which should fix the problem.

Original comment by maglion...@gmail.com on 31 Oct 2009 at 2:21

Attachments:

GoogleCodeExporter commented 9 years ago
The patch fixes the problem I was seeing, thank you.
Moreover, 'echo quit > ctl' now says 'operation not permitted' instead of 
'unknown error 526: ctl' (or crashing)...

Any ideas why you were unable to rerproduce this?

P.S. wmii.suckless.org contains two dead links, namely:
        http://wmii.suckless.org/libs/libixp.html | libixp-0.4
        http://suckless.org/irc/ | irc log 

Original comment by rogu...@gmail.com on 31 Oct 2009 at 4:35

GoogleCodeExporter commented 9 years ago
Ok, I'll push it, then. Thanks.

I couldn't reproduce this because I was chdiring to the mount directory, which 
made
the negotiation slightly different.

Thanks. Unfortunately, suckless.org changes rather like the wind, and the 
people who
do the changing aren't always so scrupulous as to make sure that it goes 
smoothly.

Original comment by maglion...@gmail.com on 31 Oct 2009 at 4:51

GoogleCodeExporter commented 9 years ago
This issue was closed by revision 58a562fd2d.

Original comment by maglion...@gmail.com on 31 Oct 2009 at 10:08