Open gconnell opened 9 years ago
@gconnell any indication what the root cause of this might be?
Pretty sure it's my inability to program bug-free code ;D
@gconnell as it is for us all :) I've noticed periodic watchdog failures on quiet links. Thought this might be related. Will update this issue if I end up finding what the problem is...
Not sure if this is related... I'm trying to test stenotype
by itself on my laptop. Instead of running the integration_test/test.sh
script, I pulled the relevant parts out of it to replay some traffic on a dummy interface.
stenotype
./stenographer/{index,packets}
directoriessudo ./stenotype --threads=1 --iface=dummy0 --dir=./stenographer -vvv
sudo tcpreplay -i dummy0 --topspeed test.pcap
(from different terminal)Now, after starting stenotype
, but before I could even run tcpreplay
, I got the following output.
2018-05-16T21:41:06.152862Z T:8061b7 [stenotype.cc:545] Stenotype running with these arguments:
2018-05-16T21:41:06.152920Z T:8061b7 [stenotype.cc:547] 0: "./stenotype"
2018-05-16T21:41:06.152924Z T:8061b7 [stenotype.cc:547] 1: "--threads=1"
2018-05-16T21:41:06.152926Z T:8061b7 [stenotype.cc:547] 2: "--iface=dummy0"
2018-05-16T21:41:06.152932Z T:8061b7 [stenotype.cc:547] 3: "--dir=./stenographer"
2018-05-16T21:41:06.152936Z T:8061b7 [stenotype.cc:547] 4: "-vvv"
2018-05-16T21:41:06.152940Z T:8061b7 [stenotype.cc:549] Starting, page size is 4096
2018-05-16T21:41:06.152945Z T:8061b7 [stenotype.cc:576] Setting up AF_PACKET sockets for packet reading
2018-05-16T21:41:06.396509Z T:8061b7 [packets.cc:229] Setting promiscuous mode for dummy0
2018-05-16T21:41:06.396532Z T:8061b7 [packets.cc:237] Interface dummy0 already in promisc mode
2018-05-16T21:41:06.396582Z T:8061b7 [stenotype.cc:248] Dropping privileges
2018-05-16T21:41:06.396588Z T:8061b7 [stenotype.cc:253] Dropping priviledges from 0 to GID nogroup
2018-05-16T21:41:06.396929Z T:8061b7 [stenotype.cc:265] Dropping priviledges from 0 to UID nobody
2018-05-16T21:41:06.397141Z T:8061b7 [stenotype.cc:652] Starting writing threads
2018-05-16T21:41:06.397150Z T:8061b7 [stenotype.cc:655] Starting thread 0
2018-05-16T21:41:06.397146Z T:fdf9e7 [stenotype.cc:427] Handling signals
2018-05-16T21:41:06.397183Z T:8061b7 [stenotype.cc:666] Starting indexing threads
2018-05-16T21:41:06.397259Z T:fd79d7 [stenotype.cc:395] Starting WriteIndexes thread 0
2018-05-16T21:41:06.397472Z T:8061b7 [stenotype.cc:680] ===============Waiting for thread==============
2018-05-16T21:41:06.397521Z T:f7fff7 [stenotype.cc:450] Thread 0 starting to process packets
2018-05-16T21:41:06.397585Z T:fd79d7 [stenotype.cc:403] Waiting for index
2018-05-16T21:41:06.397668Z T:8061b7 [stenotype.cc:683] Thread finished
2018-05-16T21:41:06.397678Z T:8061b7 [stenotype.cc:686] Finished all threads
2018-05-16T21:41:06.397683Z T:8061b7 [stenotype.cc:689] Closing write index queue 0, waiting for thread
2018-05-16T21:41:06.397698Z T:fd79d7 [stenotype.cc:405] Got index 0
2018-05-16T21:41:06.397711Z T:fd79d7 [stenotype.cc:414] Exiting write index thread
2018-05-16T21:41:07.397793Z T:8061b7 [stenotype.cc:693] Index thread finished
2018-05-16T21:41:07.397816Z T:8061b7 [stenotype.cc:698] Process exiting successfully
2018-05-16T21:41:07.397832Z T:fdf9e7 [stenotype.cc:436] Signal handling done
Eventually, the following was also output.
2018-05-16T21:43:06.418603Z T:fcf9c7 [util.cc:117] WATCHDOG FAILURE: Thread 0
ABORTABORTABORT
./stenotype(+0x70a1) [0x556e51dc40a1]
./stenotype(+0x5240) [0x556e51dc2240]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbe733) [0x7f297f68d733]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f297fba96db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f297f0e788f]
[1] 10728 abort sudo ./stenotype --threads=1 --iface=dummy0 --dir=./stenographer -vvv
As you can see from my commands above, in this particular case I ran stenotype
as root, but I have also tried running it as the stenographer
user w/ appropriate file/dir permissions and capabilities set as per the documentation. Either way I get the same result.
Any ideas? I'm really needing to get this working for a demo I have coming up.
Hey, sure:
Notice these lines:
2018-05-16T21:41:06.396582Z T:8061b7 [stenotype.cc:248] Dropping privileges
2018-05-16T21:41:06.396588Z T:8061b7 [stenotype.cc:253] Dropping priviledges from 0 to GID nogroup
2018-05-16T21:41:06.396929Z T:8061b7 [stenotype.cc:265] Dropping priviledges from 0 to UID nobody
My guess is that user nobody
doesn't have the access you're looking for ;)
Try setting --uid=X --gid=Y
flags
Alternatively, try running stenographer, then do a ps auxww | grep stenotype
and see what flags it sets :D. Note that it only drops those privileges if it's running as root. If it's NOT running as root, it needs the capability to do AF_PACKET stuff, which can be set with the command here: https://github.com/google/stenographer/blob/master/lib.sh#L67
You can also try single-stepping through the install.sh
script to see if there's anything there that might be missing.
For me, it's currently running as user stenographer
on my machine with these flags:
/usr/bin/stenotype -v --blocks=128 --blocksize_kb=256 --threads=1 --iface=eth0 --dir=/tmp/stenographer978481575
Reach out if you're still having issues!
@gconnell thanks for responding! So I get similar output when I run it as a non-root user. The user w/ uid=1000 gid=1000
has full permissions on the ./stenographer
directory and I've set the correct capabilities on the stenotype
executable as per the stuff in lib.sh
that you referenced.
➜ ls -nh .
drwxrwxr-x 4 1000 1000 4.0K May 16 15:23 stenographer
-rwxrwxr-x 1 1000 1000 2.4M May 16 15:22 stenotype
➜ ls -nh ./stenographer
drwxrwxr-x 2 1000 1000 4.0K May 16 15:23 index
drwxrwxr-x 2 1000 1000 4.0K May 16 15:23 packets
➜ getcap ./stenotype
./stenotype = cap_net_admin,cap_net_raw,cap_ipc_lock+ep
When I run it I get the following output.
➜ ./stenotype -vvv --threads=1 --iface=wlp3s0 --dir ./stenographer
2018-05-16T23:04:47.534030Z T:473aa7 [stenotype.cc:545] Stenotype running with these arguments:
2018-05-16T23:04:47.534191Z T:473aa7 [stenotype.cc:547] 0: "./stenotype"
2018-05-16T23:04:47.534223Z T:473aa7 [stenotype.cc:547] 1: "-vvv"
2018-05-16T23:04:47.534244Z T:473aa7 [stenotype.cc:547] 2: "--threads=1"
2018-05-16T23:04:47.534261Z T:473aa7 [stenotype.cc:547] 3: "--iface=wlp3s0"
2018-05-16T23:04:47.534278Z T:473aa7 [stenotype.cc:547] 4: "--dir"
2018-05-16T23:04:47.534294Z T:473aa7 [stenotype.cc:547] 5: "./stenographer"
2018-05-16T23:04:47.534310Z T:473aa7 [stenotype.cc:549] Starting, page size is 4096
2018-05-16T23:04:47.534333Z T:473aa7 [stenotype.cc:576] Setting up AF_PACKET sockets for packet reading
2018-05-16T23:04:47.780045Z T:473aa7 [packets.cc:229] Setting promiscuous mode for wlp3s0
2018-05-16T23:04:47.780239Z T:473aa7 [stenotype.cc:248] Dropping privileges
2018-05-16T23:04:47.780250Z T:473aa7 [stenotype.cc:259] Staying with GID=1000
2018-05-16T23:04:47.780254Z T:473aa7 [stenotype.cc:273] Staying with UID=1000
2018-05-16T23:04:47.780313Z T:473aa7 [stenotype.cc:652] Starting writing threads
2018-05-16T23:04:47.780319Z T:473aa7 [stenotype.cc:655] Starting thread 0
2018-05-16T23:04:47.780395Z T:473aa7 [stenotype.cc:666] Starting indexing threads
2018-05-16T23:04:47.780627Z T:473aa7 [stenotype.cc:680] ===============Waiting for thread==============
2018-05-16T23:04:47.780631Z T:c4fad7 [stenotype.cc:450] Thread 0 starting to process packets
2018-05-16T23:04:47.780727Z T:c57ae7 [stenotype.cc:427] Handling signals
2018-05-16T23:04:47.780892Z T:c47ac7 [stenotype.cc:395] Starting WriteIndexes thread 0
2018-05-16T23:04:47.781006Z T:473aa7 [stenotype.cc:683] Thread finished
2018-05-16T23:04:47.781096Z T:473aa7 [stenotype.cc:686] Finished all threads
2018-05-16T23:04:47.781073Z T:c47ac7 [stenotype.cc:403] Waiting for index
2018-05-16T23:04:47.781202Z T:473aa7 [stenotype.cc:689] Closing write index queue 0, waiting for thread
2018-05-16T23:04:47.781514Z T:c47ac7 [stenotype.cc:405] Got index 0
2018-05-16T23:04:47.781524Z T:c47ac7 [stenotype.cc:414] Exiting write index thread
2018-05-16T23:04:48.781656Z T:473aa7 [stenotype.cc:693] Index thread finished
2018-05-16T23:04:48.781712Z T:473aa7 [stenotype.cc:698] Process exiting successfully
2018-05-16T23:04:48.781770Z T:c57ae7 [stenotype.cc:436] Signal handling done
2018-05-16T23:06:47.809726Z T:c3fab7 [util.cc:117] WATCHDOG FAILURE: Thread 0
ABORTABORTABORT
./stenotype(+0x70a1) [0x560238a340a1]
./stenotype(+0x5240) [0x560238a32240]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbe733) [0x7fda4641c733]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fda469386db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fda45e7688f]
[1] 12690 abort (core dumped) ./stenotype -vvv --threads=1 --iface=wlp3s0 --dir ./stenographer
I'm at a loss for what to try next!
Go for the nuclear option: --no_watchdogs
While you're at it, you might also try --seccomp=none
, but it doesn't look like seccomp's your issue.
OK I’ll give that a shot. I’m curious though if it’s normal for the process to exit that quickly? I end up with a zombie process pretty much immediately.
Could it be due to low traffic?
On May 16, 2018, at 17:29, Graeme Connell notifications@github.com wrote:
Go for the nuclear option: --no_watchdogs
While you're at it, you might also try --seccomp=none, but it doesn't look like seccomp's your issue.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Looks like it was a seccomp issue... --seccomp=none
has it working now. Thanks!
Any idea why seccomp
was the issue? What's the default value?
One of the treads is calling a syscall we don't expect.
All three thread types (main thread, each indexing thread, and each packet processing thread) have a compiled-in whitelist of which syscalls they're able to do, and will crash hard (seccomp=kill) if an unexpected syscall happens. However, unfortunately sometimes standard libraries and the like change which syscalls they use (for example, switching from mmap to mmap2), so when you compiled you picked up a library that uses a syscall we weren't expecting.
If you wouldn't mind, it'd be great to figure out which call this is. To do so, try running stenotype
as root with --uid=root --gid=root --seccomp=trace
inside of an strace -f
, like this:
# as root:
strace -f -o /tmp/strace.out /path/to/stenotype --uid=root --gid=root --seccomp=trace --all-your-other-flags
Then, look for EINVAL
in the /tmp/strace.out
file created by the strace run. If we can figure out which syscall it is, we can add it to the list of acceptable ones.
OK, so as root I ran the following (on Ubuntu 18.04 LTS):
# strace -f -o /tmp/strace.out ./stenotype --uid=root --gid=root --seccomp=trace -vvv --threads=1 --dir=./stenographer --iface=dummy0
2018-05-17T02:50:26.088009Z T:5ad0d7 [stenotype.cc:545] Stenotype running with these arguments:
2018-05-17T02:50:26.088221Z T:5ad0d7 [stenotype.cc:547] 0: "./stenotype"
2018-05-17T02:50:26.088271Z T:5ad0d7 [stenotype.cc:547] 1: "--uid=root"
2018-05-17T02:50:26.088310Z T:5ad0d7 [stenotype.cc:547] 2: "--gid=root"
2018-05-17T02:50:26.088341Z T:5ad0d7 [stenotype.cc:547] 3: "--seccomp=trace"
2018-05-17T02:50:26.088370Z T:5ad0d7 [stenotype.cc:547] 4: "-vvv"
2018-05-17T02:50:26.088423Z T:5ad0d7 [stenotype.cc:547] 5: "--threads=1"
2018-05-17T02:50:26.088465Z T:5ad0d7 [stenotype.cc:547] 6: "--dir=./stenographer"
2018-05-17T02:50:26.088507Z T:5ad0d7 [stenotype.cc:547] 7: "--iface=dummy0"
2018-05-17T02:50:26.088549Z T:5ad0d7 [stenotype.cc:549] Starting, page size is 4096
2018-05-17T02:50:26.088573Z T:5ad0d7 [stenotype.cc:576] Setting up AF_PACKET sockets for packet reading
2018-05-17T02:50:31.776284Z T:5ad0d7 [packets.cc:229] Setting promiscuous mode for dummy0
2018-05-17T02:50:31.776362Z T:5ad0d7 [packets.cc:237] Interface dummy0 already in promisc mode
2018-05-17T02:50:31.776561Z T:5ad0d7 [stenotype.cc:248] Dropping privileges
2018-05-17T02:50:31.776613Z T:5ad0d7 [stenotype.cc:253] Dropping priviledges from 0 to GID root
2018-05-17T02:50:31.779845Z T:5ad0d7 [stenotype.cc:265] Dropping priviledges from 0 to UID root
2018-05-17T02:50:31.780856Z T:5ad0d7 [stenotype.cc:652] Starting writing threads
2018-05-17T02:50:31.780896Z T:5ad0d7 [stenotype.cc:655] Starting thread 0
2018-05-17T02:50:31.780850Z T:d86907 [stenotype.cc:427] Handling signals
2018-05-17T02:50:31.781497Z T:5ad0d7 [stenotype.cc:666] Starting indexing threads
2018-05-17T02:50:31.781641Z T:d768e7 [stenotype.cc:395] Starting WriteIndexes thread 0
2018-05-17T02:50:31.781833Z T:5ad0d7 [stenotype.cc:680] ===============Waiting for thread==============
2018-05-17T02:50:31.782252Z T:d7e8f7 [stenotype.cc:450] Thread 0 starting to process packets
2018-05-17T02:50:31.782342Z T:d768e7 [stenotype.cc:403] Waiting for index
2018-05-17T02:50:31.782456Z T:d7e8f7 [aio.cc:190] Opening packet file ./stenographer/PKT0/.1526525431782203: -1
2018-05-17T02:50:31.782522Z T:d7e8f7 [stenotype.cc:462] CHECK(SUCCEEDED(__check_success_error__)) output.Rotate(file_dirname, micros, flag_preallocate_file_mb << 20): Function not implemented <- open
ABORTABORTABORT
./stenotype(+0x70a1) [0x555c999250a1]
./stenotype(+0x24931) [0x555c99942931]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbe733) [0x7f2d59d7f733]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f2d5a29b6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f2d597d988f]
Segmentation fault (core dumped)
I then grepped the resulting /tmp/strace.out
file for EINVAL
:
# cat /tmp/strace.out | grep EINVAL
28420 <... seccomp resumed> ) = -1 EINVAL (Invalid argument)
28421 tgkill(-38, -38, SIGABRT) = -1 EINVAL (Invalid argument)
28421 tgkill(-38, -38, SIGABRT) = -1 EINVAL (Invalid argument)
If you need to see the full contents of /tmp/strace.out
let me know.
@gconnell any ideas on this? I have a Dockerfile I'd like to submit as a pull request, but I don't want to submit it until we can leave seccomp=kill
(the default).
Stenotype still has issues with zero-packet interfaces. The integration test is a good example of this: often, stenotype must be
kill -9
'd, because it's unresponsive unless it gets some non-zero number of packets regularly.