google / stenographer

Stenographer is a packet capture solution which aims to quickly spool all packets to disk, then provide simple, fast access to subsets of those packets. Discussion/announcements at stenographer@googlegroups.com
Apache License 2.0
1.79k stars 238 forks source link

Steno still behaves poorly on zero-packet interfaces #94

Open gconnell opened 9 years ago

gconnell commented 9 years ago

Stenotype still has issues with zero-packet interfaces. The integration test is a good example of this: often, stenotype must be kill -9'd, because it's unresponsive unless it gets some non-zero number of packets regularly.

rizard commented 6 years ago

@gconnell any indication what the root cause of this might be?

gconnell commented 6 years ago

Pretty sure it's my inability to program bug-free code ;D

rizard commented 6 years ago

@gconnell as it is for us all :) I've noticed periodic watchdog failures on quiet links. Thought this might be related. Will update this issue if I end up finding what the problem is...

activeshadow commented 6 years ago

Not sure if this is related... I'm trying to test stenotype by itself on my laptop. Instead of running the integration_test/test.sh script, I pulled the relevant parts out of it to replay some traffic on a dummy interface.

  1. Compiled stenotype
  2. Created ./stenographer/{index,packets} directories
  3. Ran sudo ./stenotype --threads=1 --iface=dummy0 --dir=./stenographer -vvv
  4. Ran sudo tcpreplay -i dummy0 --topspeed test.pcap (from different terminal)

Now, after starting stenotype, but before I could even run tcpreplay, I got the following output.

2018-05-16T21:41:06.152862Z T:8061b7 [stenotype.cc:545] Stenotype running with these arguments:
2018-05-16T21:41:06.152920Z T:8061b7 [stenotype.cc:547] 0:      "./stenotype"
2018-05-16T21:41:06.152924Z T:8061b7 [stenotype.cc:547] 1:      "--threads=1"
2018-05-16T21:41:06.152926Z T:8061b7 [stenotype.cc:547] 2:      "--iface=dummy0"
2018-05-16T21:41:06.152932Z T:8061b7 [stenotype.cc:547] 3:      "--dir=./stenographer"
2018-05-16T21:41:06.152936Z T:8061b7 [stenotype.cc:547] 4:      "-vvv"
2018-05-16T21:41:06.152940Z T:8061b7 [stenotype.cc:549] Starting, page size is 4096
2018-05-16T21:41:06.152945Z T:8061b7 [stenotype.cc:576] Setting up AF_PACKET sockets for packet reading
2018-05-16T21:41:06.396509Z T:8061b7 [packets.cc:229] Setting promiscuous mode for dummy0
2018-05-16T21:41:06.396532Z T:8061b7 [packets.cc:237] Interface dummy0 already in promisc mode
2018-05-16T21:41:06.396582Z T:8061b7 [stenotype.cc:248] Dropping privileges
2018-05-16T21:41:06.396588Z T:8061b7 [stenotype.cc:253] Dropping priviledges from 0 to GID nogroup
2018-05-16T21:41:06.396929Z T:8061b7 [stenotype.cc:265] Dropping priviledges from 0 to UID nobody
2018-05-16T21:41:06.397141Z T:8061b7 [stenotype.cc:652] Starting writing threads
2018-05-16T21:41:06.397150Z T:8061b7 [stenotype.cc:655] Starting thread 0
2018-05-16T21:41:06.397146Z T:fdf9e7 [stenotype.cc:427] Handling signals
2018-05-16T21:41:06.397183Z T:8061b7 [stenotype.cc:666] Starting indexing threads
2018-05-16T21:41:06.397259Z T:fd79d7 [stenotype.cc:395] Starting WriteIndexes thread 0
2018-05-16T21:41:06.397472Z T:8061b7 [stenotype.cc:680] ===============Waiting for thread==============
2018-05-16T21:41:06.397521Z T:f7fff7 [stenotype.cc:450] Thread 0 starting to process packets
2018-05-16T21:41:06.397585Z T:fd79d7 [stenotype.cc:403] Waiting for index
2018-05-16T21:41:06.397668Z T:8061b7 [stenotype.cc:683] Thread finished
2018-05-16T21:41:06.397678Z T:8061b7 [stenotype.cc:686] Finished all threads
2018-05-16T21:41:06.397683Z T:8061b7 [stenotype.cc:689] Closing write index queue 0, waiting for thread
2018-05-16T21:41:06.397698Z T:fd79d7 [stenotype.cc:405] Got index 0
2018-05-16T21:41:06.397711Z T:fd79d7 [stenotype.cc:414] Exiting write index thread
2018-05-16T21:41:07.397793Z T:8061b7 [stenotype.cc:693] Index thread finished
2018-05-16T21:41:07.397816Z T:8061b7 [stenotype.cc:698] Process exiting successfully
2018-05-16T21:41:07.397832Z T:fdf9e7 [stenotype.cc:436] Signal handling done

Eventually, the following was also output.

2018-05-16T21:43:06.418603Z T:fcf9c7 [util.cc:117] WATCHDOG FAILURE: Thread 0
ABORTABORTABORT
./stenotype(+0x70a1) [0x556e51dc40a1]
./stenotype(+0x5240) [0x556e51dc2240]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbe733) [0x7f297f68d733]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f297fba96db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f297f0e788f]
[1]    10728 abort      sudo ./stenotype --threads=1 --iface=dummy0 --dir=./stenographer -vvv

As you can see from my commands above, in this particular case I ran stenotype as root, but I have also tried running it as the stenographer user w/ appropriate file/dir permissions and capabilities set as per the documentation. Either way I get the same result.

Any ideas? I'm really needing to get this working for a demo I have coming up.

gconnell commented 6 years ago

Hey, sure:

Notice these lines:

2018-05-16T21:41:06.396582Z T:8061b7 [stenotype.cc:248] Dropping privileges
2018-05-16T21:41:06.396588Z T:8061b7 [stenotype.cc:253] Dropping priviledges from 0 to GID nogroup
2018-05-16T21:41:06.396929Z T:8061b7 [stenotype.cc:265] Dropping priviledges from 0 to UID nobody

My guess is that user nobody doesn't have the access you're looking for ;)

Try setting --uid=X --gid=Y flags

Alternatively, try running stenographer, then do a ps auxww | grep stenotype and see what flags it sets :D. Note that it only drops those privileges if it's running as root. If it's NOT running as root, it needs the capability to do AF_PACKET stuff, which can be set with the command here: https://github.com/google/stenographer/blob/master/lib.sh#L67

You can also try single-stepping through the install.sh script to see if there's anything there that might be missing.

For me, it's currently running as user stenographer on my machine with these flags:

/usr/bin/stenotype -v --blocks=128 --blocksize_kb=256 --threads=1 --iface=eth0 --dir=/tmp/stenographer978481575

Reach out if you're still having issues!

activeshadow commented 6 years ago

@gconnell thanks for responding! So I get similar output when I run it as a non-root user. The user w/ uid=1000 gid=1000 has full permissions on the ./stenographer directory and I've set the correct capabilities on the stenotype executable as per the stuff in lib.sh that you referenced.

➜ ls -nh .
drwxrwxr-x 4 1000 1000 4.0K May 16 15:23 stenographer
-rwxrwxr-x 1 1000 1000 2.4M May 16 15:22 stenotype
➜ ls -nh ./stenographer 
drwxrwxr-x 2 1000 1000 4.0K May 16 15:23 index
drwxrwxr-x 2 1000 1000 4.0K May 16 15:23 packets
➜ getcap ./stenotype
./stenotype = cap_net_admin,cap_net_raw,cap_ipc_lock+ep

When I run it I get the following output.

➜ ./stenotype -vvv --threads=1 --iface=wlp3s0 --dir ./stenographer 
2018-05-16T23:04:47.534030Z T:473aa7 [stenotype.cc:545] Stenotype running with these arguments:
2018-05-16T23:04:47.534191Z T:473aa7 [stenotype.cc:547] 0:      "./stenotype"
2018-05-16T23:04:47.534223Z T:473aa7 [stenotype.cc:547] 1:      "-vvv"
2018-05-16T23:04:47.534244Z T:473aa7 [stenotype.cc:547] 2:      "--threads=1"
2018-05-16T23:04:47.534261Z T:473aa7 [stenotype.cc:547] 3:      "--iface=wlp3s0"
2018-05-16T23:04:47.534278Z T:473aa7 [stenotype.cc:547] 4:      "--dir"
2018-05-16T23:04:47.534294Z T:473aa7 [stenotype.cc:547] 5:      "./stenographer"
2018-05-16T23:04:47.534310Z T:473aa7 [stenotype.cc:549] Starting, page size is 4096
2018-05-16T23:04:47.534333Z T:473aa7 [stenotype.cc:576] Setting up AF_PACKET sockets for packet reading
2018-05-16T23:04:47.780045Z T:473aa7 [packets.cc:229] Setting promiscuous mode for wlp3s0
2018-05-16T23:04:47.780239Z T:473aa7 [stenotype.cc:248] Dropping privileges
2018-05-16T23:04:47.780250Z T:473aa7 [stenotype.cc:259] Staying with GID=1000
2018-05-16T23:04:47.780254Z T:473aa7 [stenotype.cc:273] Staying with UID=1000
2018-05-16T23:04:47.780313Z T:473aa7 [stenotype.cc:652] Starting writing threads
2018-05-16T23:04:47.780319Z T:473aa7 [stenotype.cc:655] Starting thread 0
2018-05-16T23:04:47.780395Z T:473aa7 [stenotype.cc:666] Starting indexing threads
2018-05-16T23:04:47.780627Z T:473aa7 [stenotype.cc:680] ===============Waiting for thread==============
2018-05-16T23:04:47.780631Z T:c4fad7 [stenotype.cc:450] Thread 0 starting to process packets
2018-05-16T23:04:47.780727Z T:c57ae7 [stenotype.cc:427] Handling signals
2018-05-16T23:04:47.780892Z T:c47ac7 [stenotype.cc:395] Starting WriteIndexes thread 0
2018-05-16T23:04:47.781006Z T:473aa7 [stenotype.cc:683] Thread finished
2018-05-16T23:04:47.781096Z T:473aa7 [stenotype.cc:686] Finished all threads
2018-05-16T23:04:47.781073Z T:c47ac7 [stenotype.cc:403] Waiting for index
2018-05-16T23:04:47.781202Z T:473aa7 [stenotype.cc:689] Closing write index queue 0, waiting for thread
2018-05-16T23:04:47.781514Z T:c47ac7 [stenotype.cc:405] Got index 0
2018-05-16T23:04:47.781524Z T:c47ac7 [stenotype.cc:414] Exiting write index thread
2018-05-16T23:04:48.781656Z T:473aa7 [stenotype.cc:693] Index thread finished
2018-05-16T23:04:48.781712Z T:473aa7 [stenotype.cc:698] Process exiting successfully
2018-05-16T23:04:48.781770Z T:c57ae7 [stenotype.cc:436] Signal handling done
2018-05-16T23:06:47.809726Z T:c3fab7 [util.cc:117] WATCHDOG FAILURE: Thread 0
ABORTABORTABORT
./stenotype(+0x70a1) [0x560238a340a1]
./stenotype(+0x5240) [0x560238a32240]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbe733) [0x7fda4641c733]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7fda469386db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fda45e7688f]
[1]    12690 abort (core dumped)  ./stenotype -vvv --threads=1 --iface=wlp3s0 --dir ./stenographer

I'm at a loss for what to try next!

gconnell commented 6 years ago

Go for the nuclear option: --no_watchdogs

While you're at it, you might also try --seccomp=none, but it doesn't look like seccomp's your issue.

activeshadow commented 6 years ago

OK I’ll give that a shot. I’m curious though if it’s normal for the process to exit that quickly? I end up with a zombie process pretty much immediately.

Could it be due to low traffic?

On May 16, 2018, at 17:29, Graeme Connell notifications@github.com wrote:

Go for the nuclear option: --no_watchdogs

While you're at it, you might also try --seccomp=none, but it doesn't look like seccomp's your issue.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

activeshadow commented 6 years ago

Looks like it was a seccomp issue... --seccomp=none has it working now. Thanks!

Any idea why seccomp was the issue? What's the default value?

gconnell commented 6 years ago

One of the treads is calling a syscall we don't expect.

All three thread types (main thread, each indexing thread, and each packet processing thread) have a compiled-in whitelist of which syscalls they're able to do, and will crash hard (seccomp=kill) if an unexpected syscall happens. However, unfortunately sometimes standard libraries and the like change which syscalls they use (for example, switching from mmap to mmap2), so when you compiled you picked up a library that uses a syscall we weren't expecting.

If you wouldn't mind, it'd be great to figure out which call this is. To do so, try running stenotype as root with --uid=root --gid=root --seccomp=trace inside of an strace -f, like this:

# as root:
strace -f -o /tmp/strace.out /path/to/stenotype --uid=root --gid=root --seccomp=trace --all-your-other-flags

Then, look for EINVAL in the /tmp/strace.out file created by the strace run. If we can figure out which syscall it is, we can add it to the list of acceptable ones.

activeshadow commented 6 years ago

OK, so as root I ran the following (on Ubuntu 18.04 LTS):

# strace -f -o /tmp/strace.out ./stenotype --uid=root --gid=root --seccomp=trace -vvv --threads=1 --dir=./stenographer --iface=dummy0
2018-05-17T02:50:26.088009Z T:5ad0d7 [stenotype.cc:545] Stenotype running with these arguments:
2018-05-17T02:50:26.088221Z T:5ad0d7 [stenotype.cc:547] 0:      "./stenotype"
2018-05-17T02:50:26.088271Z T:5ad0d7 [stenotype.cc:547] 1:      "--uid=root"
2018-05-17T02:50:26.088310Z T:5ad0d7 [stenotype.cc:547] 2:      "--gid=root"
2018-05-17T02:50:26.088341Z T:5ad0d7 [stenotype.cc:547] 3:      "--seccomp=trace"
2018-05-17T02:50:26.088370Z T:5ad0d7 [stenotype.cc:547] 4:      "-vvv"
2018-05-17T02:50:26.088423Z T:5ad0d7 [stenotype.cc:547] 5:      "--threads=1"
2018-05-17T02:50:26.088465Z T:5ad0d7 [stenotype.cc:547] 6:      "--dir=./stenographer"
2018-05-17T02:50:26.088507Z T:5ad0d7 [stenotype.cc:547] 7:      "--iface=dummy0"
2018-05-17T02:50:26.088549Z T:5ad0d7 [stenotype.cc:549] Starting, page size is 4096
2018-05-17T02:50:26.088573Z T:5ad0d7 [stenotype.cc:576] Setting up AF_PACKET sockets for packet reading
2018-05-17T02:50:31.776284Z T:5ad0d7 [packets.cc:229] Setting promiscuous mode for dummy0
2018-05-17T02:50:31.776362Z T:5ad0d7 [packets.cc:237] Interface dummy0 already in promisc mode
2018-05-17T02:50:31.776561Z T:5ad0d7 [stenotype.cc:248] Dropping privileges
2018-05-17T02:50:31.776613Z T:5ad0d7 [stenotype.cc:253] Dropping priviledges from 0 to GID root
2018-05-17T02:50:31.779845Z T:5ad0d7 [stenotype.cc:265] Dropping priviledges from 0 to UID root
2018-05-17T02:50:31.780856Z T:5ad0d7 [stenotype.cc:652] Starting writing threads
2018-05-17T02:50:31.780896Z T:5ad0d7 [stenotype.cc:655] Starting thread 0
2018-05-17T02:50:31.780850Z T:d86907 [stenotype.cc:427] Handling signals
2018-05-17T02:50:31.781497Z T:5ad0d7 [stenotype.cc:666] Starting indexing threads
2018-05-17T02:50:31.781641Z T:d768e7 [stenotype.cc:395] Starting WriteIndexes thread 0
2018-05-17T02:50:31.781833Z T:5ad0d7 [stenotype.cc:680] ===============Waiting for thread==============
2018-05-17T02:50:31.782252Z T:d7e8f7 [stenotype.cc:450] Thread 0 starting to process packets
2018-05-17T02:50:31.782342Z T:d768e7 [stenotype.cc:403] Waiting for index
2018-05-17T02:50:31.782456Z T:d7e8f7 [aio.cc:190] Opening packet file ./stenographer/PKT0/.1526525431782203: -1
2018-05-17T02:50:31.782522Z T:d7e8f7 [stenotype.cc:462] CHECK(SUCCEEDED(__check_success_error__)) output.Rotate(file_dirname, micros, flag_preallocate_file_mb << 20): Function not implemented <- open
ABORTABORTABORT
./stenotype(+0x70a1) [0x555c999250a1]
./stenotype(+0x24931) [0x555c99942931]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbe733) [0x7f2d59d7f733]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f2d5a29b6db]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f2d597d988f]
Segmentation fault (core dumped)

I then grepped the resulting /tmp/strace.out file for EINVAL:

# cat /tmp/strace.out | grep EINVAL
28420 <... seccomp resumed> )           = -1 EINVAL (Invalid argument)
28421 tgkill(-38, -38, SIGABRT)         = -1 EINVAL (Invalid argument)
28421 tgkill(-38, -38, SIGABRT)         = -1 EINVAL (Invalid argument)

If you need to see the full contents of /tmp/strace.out let me know.

activeshadow commented 6 years ago

@gconnell any ideas on this? I have a Dockerfile I'd like to submit as a pull request, but I don't want to submit it until we can leave seccomp=kill (the default).