mozilla-services / Dockerflow

Cloud Services Dockerflow specification
Apache License 2.0
199 stars 28 forks source link

Logging to stdout in the mozlog json schema harmful for pre-forking servers in some situations #31

Closed dylanwh closed 6 years ago

dylanwh commented 6 years ago

If your application isn't single threaded/process the optional recommendation will result in corrupted log entries when there are frequent writes over 4k in size.

references

  1. https://www.gnu.org/software/libc/manual/html_node/Pipe-Atomicity.html
  2. https://www.notthewizard.com/2014/06/17/are-files-appends-really-atomic/
  3. minimal gunicorn test case in https://gist.github.com/dylanwh/dbb7acc97151503ef061f0895ec0d137

work arounds

  1. Prevent writes over 4k (or log additional warnings)
  2. Use something to mux the input. Each worker thread/process should log to a socket and there should be a single process multiplexing over that socket and writing to the container's stdout
  3. Run only a single process per container (viable for smaller services?)
  4. Write your service in go, which seems to handle multiple goroutines writing to stdout just fine,
dylanwh commented 6 years ago

Additionally, if you log to a file and open that file with O_APPEND, each call to write() will be atomic for much larger sizes because this is traditionally how log files work. You could log to files and run tail -f to stdout, probably. I haven't tested this scenario.

mostlygeek commented 6 years ago

So first thought, is sending logs to stdout harmful?

No. A lot of back history and discussion was ultimately boiled down into "send text logs to stdout or stderr". The whole point of them is to be able to decouple and glue apps together.

Looking at your example I think it's a problem to use sys.stdout directly. stdout is just a stream of bytes. Writing to it directly, in parallel will result in a bizarre mix of data if the library doing it doesn't handle it correctly. No surprises the JSON is broken. To log messages try something like the python logging module.

edit: I'm not great with python. Maybe using print directly is not high level enough, maybe it's gunicorn, maybe docker. Something is not fully buffering and handling it correctly.

I wrote a golang example to test parallel writers going to stdout and using jq to parse and handle the output. Any broken JSON in the stream should result in jq exiting early.

dylanwh commented 6 years ago

python logging to stdout will have the same behavior. :-) @ckolos can perhaps explain this a bit more?

mostlygeek commented 6 years ago

I just tested it with docker / docker logs on OSX and the output is the same, no messed up JSON. Maybe something deeper in python/gunicorn?

Either way I don't think there is an issue with the "use stdout/stderr" recommendation.

dylanwh commented 6 years ago

In a single go process, I believe this is safe due to a futex being held on stdout.

However, try this (with your program named bad.go)

go compile bad.go; mv bad /bin;
( bad & bad & bad & bad & bad & bad ) | jq -rc '"id:\(.id) -> \(.data | length) bytes"'

This is despite of go smartly using a single write() per 10k message

The dockerflow instructions are not safe for any multiprocess workload, but can be safe for a multithreaded program.

dylanwh commented 6 years ago

docker run --rm -i dylanwh/badgo bash -c '( bad & bad & bad )' | jq -rc '"id:(.id) -> (.data | length) bytes"' id:7 -> 10240 bytes id:7 -> 10240 bytes id:0 -> 10240 bytes id:5 -> 10240 bytes id:6 -> 10240 bytes id:7 -> 10240 bytes id:3 -> 10240 bytes id:2 -> 10240 bytes id:4 -> 10240 bytes id:1 -> 10240 bytes id:0 -> 10240 bytes id:0 -> 10240 bytes id:3 -> 10240 bytes id:1 -> 10240 bytes id:2 -> 10240 bytes id:6 -> 10240 bytes id:4 -> 10240 bytes id:2 -> 10240 bytes id:5 -> 10240 bytes id:3 -> 10240 bytes id:4 -> 10240 bytes parse error: Invalid numeric literal at line 22, column 4101

dylanwh commented 6 years ago

When you tested this, did you have multiple CPUs? Even with a single bad.go running I get errors, unless GOMAXPROCS is set to 1.

milescrabill commented 6 years ago

My impression is that logs from containers with more than one process should be directed to syslog or similar instead of stdout/stderr.

dylanwh commented 6 years ago

👍 but that has a problem too -- syslog has a max size as well, and it defaults to truncation. Probably useful if the logging pipeline treats invalid json as potential security problem and alerts?

dylanwh commented 6 years ago

Anyway, what I would like out of this is to work my (now 4) work-arounds into the doc. #2 is basically "use syslog" but you still need to be aware of huge messages.

Additionally, when we talk about logging it's probably useful to give specific limits to anything that is logged, such as user_agent or path.

mostlygeek commented 6 years ago

I think (bad & bad & bad) | jq not working is due to how the shell (bash) is handling the stdout from each of those processes. It's really doesn't show us anything ... for example no problems:

(bad & bad & bad &) > output
 jq -rc '"id:(.id) -> (.data | length) bytes"' output.txt

I'm really curious as to what the difference is between > and |. :)

The parent process (bash, gunicorn, whatever) is responsible for how it handles the multiple stdout of its children. It's a little beyond my systems programming knowledge. Looking at the read manpage I would guess with some pseudocode this is how stdout is handled from each child process with >:

   data = ""
   n = 0
   do {
       n = read(child_stdout, buf, 1024)
       data += buf
   } while (n > 0)
   // do something with data ... 

This would be how | is handled:

// just pass it through as a stream... 
read(child_stdout, buf, 1024)
write(stdout, buf, 1024)

shrug

Fun thing to guess at though. :)

dylanwh commented 6 years ago

I've implemented a shell before :-) tl; dr the shell doesn't control writing between processes at all, and neither do pre-forking web servers.

https://brandonwamboldt.ca/how-linux-pipes-work-under-the-hood-1518/

Now, as to gunicorn (which is a commonly-used enough thing that we should care about it)

So let's run gunicorn with 4 workers and inspect it:

gunicorn --preload -w 4 app > /app.log
[2018-02-21 02:46:15 +0000] [41] [INFO] Starting gunicorn 19.4.5
[2018-02-21 02:46:15 +0000] [41] [INFO] Listening at: http://127.0.0.1:8000 (41)
[2018-02-21 02:46:15 +0000] [41] [INFO] Using worker: sync
[2018-02-21 02:46:15 +0000] [45] [INFO] Booting worker with pid: 45
[2018-02-21 02:46:15 +0000] [46] [INFO] Booting worker with pid: 46
[2018-02-21 02:46:16 +0000] [47] [INFO] Booting worker with pid: 47
[2018-02-21 02:46:16 +0000] [48] [INFO] Booting worker with pid: 48

We can spy on process 45 and see what its stdout is. stdout is by definition file descriptor 1, so let's spy on that one:

root@331a71c50a3e:~# ls -l /proc/45/fd/1
l-wx------ 1 root root 64 Feb 21 02:47 /proc/45/fd/1 -> /app.log

So process 45 is pointing to /app.log.

The same is true for each of the gunicorn workers:

root@331a71c50a3e:~# ls -l /proc/4{6,7,8}/fd/1
l-wx------ 1 root root 64 Feb 21 02:51 /proc/46/fd/1 -> /app.log
l-wx------ 1 root root 64 Feb 21 02:51 /proc/47/fd/1 -> /app.log
l-wx------ 1 root root 64 Feb 21 02:47 /proc/48/fd/1 -> /app.log
dylanwh commented 6 years ago

the master process for gunicorn doesn't do anything to stdout at all. And nobody usually has to care about this, because if stdout is a file the linux kernel will gurantee the writes happen in the right order (because this is how log files have typically worked). But if your stdout is a pipe, then you're limited to 4k.

If your logs have a structure, and take user-supplied data (such as user agent strings) you need to do one of the things I outlined as a work-around. Otherwise you're allowing someone to have their requests to be ignored from logs, which is beneficial for an attacker.

Note that FastCGI works the way @mostlygeek expects -- FastCGI has worker processes that each get their own stdout and stderr socket, managed by the FastCGI master process

dylanwh commented 6 years ago

gps was kind enough to answer some questions about python, most relevant to here:

Since multithreaded Python should be OK logging to sys.stdout. I suspect your problem is that the stdout file descriptor is getting inherited by multiple processes. There's no cross-process locking. So different processes all write to the shared pipe at the same time and badness ensues.

Perhaps the bug here belongs in https://github.com/mozilla-services/python-dockerflow? Though the same problem will repeat in any pre-forking environment.

dylanwh commented 6 years ago

Further research (this time, diving into the guts of uwsgi): if you run with uWSGI, you can pass --log-master. With that option, uwsgi routes stdout/stderr for each of its worker processes.

a brief search for "uwsgi --log-master" found this: https://stackoverflow.com/questions/18101566/logging-with-uwsgi-multiprocess-flask

Really I think we can do a service for the rest of mozilla by adding more text to the README and possibly adding a section dedicated to logging. I'll make a PR to show what I mean.