ryandotsmith / log-shuttle

New Repository: https://github.com/heroku/log-shuttle
https://github.com/heroku/log-shuttle
22 stars 6 forks source link

Consider datagram-type unix sockets #12

Closed fabiokung closed 11 years ago

fabiokung commented 12 years ago

The unix connection type in net.Listen(...) means a unix socket of type SOCK_STREAM (source code here).

By its definition, SOCK_STREAM avoids duplication, loss and guarantees the order of messages. It can generate SIGPIPE, ETIMEOUT and other errors described here.

SOCK_DGRAM is a simpler type of socket (more similar to a simple message queue). It doesn't provide any strong guarantees (messages can be delivered out of order, dropped or duplicated), but seems to be good enough for our logging purposes. Plus, it would avoid problems in the log pipeline affecting or blocking dynos. It's analogue to UDP sockets: fire and forget.

In practice, it seems that the implementation on Linux will not deliver messages out of order anyway.

More info: unix(7) and socket(2).

fabiokung commented 12 years ago

/cc @bgentry @JacobVorreuter @archaelus

fabiokung commented 12 years ago

Sorry, the PR is not ready yet. Reading the code more I realized that this would require some big changes on the way log lines are being read.

Right now it is line oriented (which requires a stream of potentially multiple packets). Very long lines can easily fill buffers though.

With unixdgram, each log would be limited to the size of a datagram, but logs could be potentially truncated. What would be better?

archaelus commented 12 years ago

Why not make this a different program?

archaelus commented 12 years ago

I think the log-shuttle project is ideally about three different things: 1) a logger replacement (stdio<->logplex) 2) a syslog<->logplex gateway (to be a companion to syslog-ng on kernel instances) 3) a logplex<->syslog gateway (for clients to run on their own machines to integrate with their existing logging infrastructure)

I would build these as three different programs maybe sharing some libraries.

fabiokung commented 12 years ago

apparently SOCK_DGRAM sockets will yield ENOBUFS for large messages, which will potentially crash processes trying to send very long log datagrams. That seems to be desirable.

Another thing I just realized is that if we switch to be datagram based, we can easily support multi-line logs. Processes can just send datagrams with multiple lines to log-shuttle.

fabiokung commented 12 years ago

I would build these as three different programs maybe sharing some libraries.

I am sure that @ryandotsmith is thinking about it, but this seems to be a different problem (or am I missing something?). Even if they were different programs, we would still need to decide how we read log lines from sockets/pipes. Datagram or stream/line based?

archaelus commented 12 years ago

The difference is kinda the input format. The logger program reads messages delimited by new lines (you could also have a byte-count-framed syslog input mode I guess), the Datagram thing reads messages in datagrams.

If you are connecting syslog-ng to this program, then I would ask for byte-count-framed messages in a stream: it lets you do multi-line messages (something hermes uses). (And multi-line messages are the future)

ryandotsmith commented 11 years ago

I am going to close this PR for now. Not saying that the idea is dead, just not going to move on it in the short term.