Define a canonical log format for misbehaving peers

MarcoPolo commented 2 years ago

(inspired from https://github.com/libp2p/go-libp2p/issues/1598)

We should agree on a canonical log format for identifying misbehaving peers across libp2p implementations.

My recommendation is a single line string that contains at least CANONICAL_MISBEHAVING_PEER:, addr=<multiaddr>, a timestamp, and optionally more key=value pairs. key=value chosen because it's easy to parse and easy to generate. (Most common timestamp formats will work fine with fail2ban and similar)

Why

We want to leverage existing tools and methods as much as possible when dealing with misbehaving peers. There are off the shelf solutions that automatically block misbehaving peers from interacting with a node. Tools like fail2ban do this by scanning log files and adding firewall rules to block malicious peers. Operators may also want to see if there was any unusual activity in the logs themselves in order to investigate abnormal states (e.g. latency on the server went way up, was there any unusual peer activity?). These logs can also be aggregated and recorded in a dashboard to provide observability to node operators independent of the specific libp2p application.

In order to make use of these ecosystem of tools, libp2p needs to highlight when a peer misbehaves. Only libp2p can do this since we're talking about protocol-level semantics (i.e. a tool that just reads packets won't be able to give us this insight).

Why a spec

This is a relatively small thing (defining what the log line should roughly look like) but would allow us to reuse fail2ban rules across libp2p implementations. It also allows operators and users to use this common knowledge to debug issues across libp2p implementations.

What to log

This spec doesn't define what should be logged, that's up to the implementation, but some suggestions:

Failed security handshake.
A peer sending nonsensical data.

Note that one of these events alone is probably not enough to block a peer with fail2ban (depends on the user's fail2ban configuration). A peer will only be blocked if it does this many times in a short period of time (exact parameters are up to the user, but we can make recommendations if we publish a fail2ban libp2p config).

Expose this to protocol developers

Only protocol implementations know what a misbehaving or suspicious peer looks like. libp2p implementations should allow these implementations to log this suspicious activity following this canonical log format. The benefit here is that this automatically hooks into a fail2ban setup.

It's up to the implementations on how they want to expose this to developers.

Prior art

https://stripe.com/blog/canonical-log-lines is a technical blog post on how something like this is used in practice.

MarcoPolo commented 2 years ago

@mxinden could you take a look here? What do you think about doing something similar in rust-libp2p?

mxinden commented 2 years ago

could you take a look here?

The proposal above looks good to me.

What do you think about doing something similar in rust-libp2p?

I am fine with including this in rust-libp2p.Though I am not sure how to prioritize it among other efforts.

I am curious what rust-libp2p users think (//CC @AgeManning (Lighthouse) @dignifiedquire (iroh) @melekes (Polkadot)).

Unless there is concrete demand for this by a rust-libp2p user, I would continue to not prioritize it.

AgeManning commented 2 years ago

Seems like a decent idea.

We wouldn't use it however, as we handle banning and misbehaving in the code internally. By default Lighthouse doesn't show rust-libp2p logs, so we wouldn't be using this explicitly.

libp2p / specs