Closed dylanwh closed 6 years ago
Additionally, if you log to a file and open that file with O_APPEND, each call to write() will be atomic for much larger sizes because this is traditionally how log files work. You could log to files and run tail -f to stdout, probably. I haven't tested this scenario.
So first thought, is sending logs to stdout harmful?
No. A lot of back history and discussion was ultimately boiled down into "send text logs to stdout or stderr". The whole point of them is to be able to decouple and glue apps together.
Looking at your example I think it's a problem to use sys.stdout
directly. stdout
is just a stream of bytes. Writing to it directly, in parallel will result in a bizarre mix of data if the library doing it doesn't handle it correctly. No surprises the JSON is broken. To log messages try something like the python logging module.
edit: I'm not great with python. Maybe using print
directly is not high level enough, maybe it's gunicorn, maybe docker. Something is not fully buffering and handling it correctly.
I wrote a golang example to test parallel writers going to stdout and using jq
to parse and handle the output. Any broken JSON in the stream should result in jq exiting early.
python logging to stdout will have the same behavior. :-) @ckolos can perhaps explain this a bit more?
I just tested it with docker
/ docker logs
on OSX and the output is the same, no messed up JSON. Maybe something deeper in python/gunicorn?
Either way I don't think there is an issue with the "use stdout/stderr" recommendation.
In a single go process, I believe this is safe due to a futex being held on stdout.
However, try this (with your program named bad.go)
go compile bad.go; mv bad /bin;
( bad & bad & bad & bad & bad & bad ) | jq -rc '"id:\(.id) -> \(.data | length) bytes"'
This is despite of go smartly using a single write() per 10k message
The dockerflow instructions are not safe for any multiprocess workload, but can be safe for a multithreaded program.
docker run --rm -i dylanwh/badgo bash -c '( bad & bad & bad )' | jq -rc '"id:(.id) -> (.data | length) bytes"' id:7 -> 10240 bytes id:7 -> 10240 bytes id:0 -> 10240 bytes id:5 -> 10240 bytes id:6 -> 10240 bytes id:7 -> 10240 bytes id:3 -> 10240 bytes id:2 -> 10240 bytes id:4 -> 10240 bytes id:1 -> 10240 bytes id:0 -> 10240 bytes id:0 -> 10240 bytes id:3 -> 10240 bytes id:1 -> 10240 bytes id:2 -> 10240 bytes id:6 -> 10240 bytes id:4 -> 10240 bytes id:2 -> 10240 bytes id:5 -> 10240 bytes id:3 -> 10240 bytes id:4 -> 10240 bytes parse error: Invalid numeric literal at line 22, column 4101
When you tested this, did you have multiple CPUs? Even with a single bad.go running I get errors, unless GOMAXPROCS is set to 1.
My impression is that logs from containers with more than one process should be directed to syslog or similar instead of stdout/stderr.
👍 but that has a problem too -- syslog has a max size as well, and it defaults to truncation. Probably useful if the logging pipeline treats invalid json as potential security problem and alerts?
Anyway, what I would like out of this is to work my (now 4) work-arounds into the doc. #2 is basically "use syslog" but you still need to be aware of huge messages.
Additionally, when we talk about logging it's probably useful to give specific limits to anything that is logged, such as user_agent or path.
I think (bad & bad & bad) | jq
not working is due to how the shell (bash) is handling the stdout from each of those processes. It's really doesn't show us anything ... for example no problems:
(bad & bad & bad &) > output
jq -rc '"id:(.id) -> (.data | length) bytes"' output.txt
I'm really curious as to what the difference is between >
and |
. :)
The parent process (bash, gunicorn, whatever) is responsible for how it handles the multiple stdout of its children. It's a little beyond my systems programming knowledge. Looking at the read manpage I would guess with some pseudocode this is how stdout is handled from each child process with >
:
data = ""
n = 0
do {
n = read(child_stdout, buf, 1024)
data += buf
} while (n > 0)
// do something with data ...
This would be how |
is handled:
// just pass it through as a stream...
read(child_stdout, buf, 1024)
write(stdout, buf, 1024)
shrug
Fun thing to guess at though. :)
I've implemented a shell before :-) tl; dr the shell doesn't control writing between processes at all, and neither do pre-forking web servers.
https://brandonwamboldt.ca/how-linux-pipes-work-under-the-hood-1518/
Now, as to gunicorn (which is a commonly-used enough thing that we should care about it)
So let's run gunicorn with 4 workers and inspect it:
gunicorn --preload -w 4 app > /app.log
[2018-02-21 02:46:15 +0000] [41] [INFO] Starting gunicorn 19.4.5
[2018-02-21 02:46:15 +0000] [41] [INFO] Listening at: http://127.0.0.1:8000 (41)
[2018-02-21 02:46:15 +0000] [41] [INFO] Using worker: sync
[2018-02-21 02:46:15 +0000] [45] [INFO] Booting worker with pid: 45
[2018-02-21 02:46:15 +0000] [46] [INFO] Booting worker with pid: 46
[2018-02-21 02:46:16 +0000] [47] [INFO] Booting worker with pid: 47
[2018-02-21 02:46:16 +0000] [48] [INFO] Booting worker with pid: 48
We can spy on process 45 and see what its stdout is. stdout is by definition file descriptor 1, so let's spy on that one:
root@331a71c50a3e:~# ls -l /proc/45/fd/1
l-wx------ 1 root root 64 Feb 21 02:47 /proc/45/fd/1 -> /app.log
So process 45 is pointing to /app.log.
The same is true for each of the gunicorn workers:
root@331a71c50a3e:~# ls -l /proc/4{6,7,8}/fd/1
l-wx------ 1 root root 64 Feb 21 02:51 /proc/46/fd/1 -> /app.log
l-wx------ 1 root root 64 Feb 21 02:51 /proc/47/fd/1 -> /app.log
l-wx------ 1 root root 64 Feb 21 02:47 /proc/48/fd/1 -> /app.log
the master process for gunicorn doesn't do anything to stdout at all. And nobody usually has to care about this, because if stdout is a file the linux kernel will gurantee the writes happen in the right order (because this is how log files have typically worked). But if your stdout is a pipe, then you're limited to 4k.
If your logs have a structure, and take user-supplied data (such as user agent strings) you need to do one of the things I outlined as a work-around. Otherwise you're allowing someone to have their requests to be ignored from logs, which is beneficial for an attacker.
Note that FastCGI works the way @mostlygeek expects -- FastCGI has worker processes that each get their own stdout and stderr socket, managed by the FastCGI master process
gps was kind enough to answer some questions about python, most relevant to here:
Since multithreaded Python should be OK logging to sys.stdout. I suspect your problem is that the stdout file descriptor is getting inherited by multiple processes. There's no cross-process locking. So different processes all write to the shared pipe at the same time and badness ensues.
Perhaps the bug here belongs in https://github.com/mozilla-services/python-dockerflow? Though the same problem will repeat in any pre-forking environment.
Further research (this time, diving into the guts of uwsgi): if you run with uWSGI, you can pass --log-master. With that option, uwsgi routes stdout/stderr for each of its worker processes.
a brief search for "uwsgi --log-master" found this: https://stackoverflow.com/questions/18101566/logging-with-uwsgi-multiprocess-flask
Really I think we can do a service for the rest of mozilla by adding more text to the README and possibly adding a section dedicated to logging. I'll make a PR to show what I mean.
If your application isn't single threaded/process the optional recommendation will result in corrupted log entries when there are frequent writes over 4k in size.
references
work arounds