Closed mattsse closed 1 year ago
Looking into this, taking inspiration from https://github.com/libp2p/rust-libp2p/blob/76abab9e20c29d03f787f39565de38dff42b0f80/src/bandwidth.rs#L135-L141
update on this, taking some time to read about futures & generally async programming in rust here: https://rust-lang.github.io/async-book/01_getting_started/01_chapter.html
In rust-libp2p
, they have a StreamMuxer
struct which enables opening "substreams." Instrumenting the StreamMuxer
with bandwidth monitoring is, I believe, how they get total bandwidth monitoring, across all streams.
Do we want to pursue a similar pattern to get a sense of total bandwidth usage by the node? I'm not sure how much refactoring this implies, something to the tune of attaching all streams w/ the network to such a "muxer" struct.
If not, and we're okay with instrumenting individual streams with bandwidth monitoring, how do we envision these metrics getting consumed? Output via Prometheus? Log? Should they be summed somewhere?
Do we want to pursue a similar pattern to get a sense of total bandwidth usage by the node? I'm not sure how much refactoring this implies, something to the tune of attaching all streams w/ the network to such a "muxer" struct.
I think for total bandwidth usage we can use common network debugging tools like iftop
.
If not, and we're okay with instrumenting individual streams with bandwidth monitoring, how do we envision these metrics getting consumed? Output via Prometheus? Log? Should they be summed somewhere?
I think probably prometheus right? We'd like to basically get outputs similar to iftop
but on a per stream basis? It's also possible that if we can observe the bandwidth inside of our code, we can also modify peers' reputation that way?
I think probably prometheus right? We'd like to basically get outputs similar to iftop but on a per stream basis? It's also possible that if we can observe the bandwidth inside of our code, we can also modify peers' reputation that way?
sounds good - I'll integrate my current code w/ usage of the metrics crate / derives etc
hmm, @mattsse does this mean the struct definition should look like:
#[derive(Metrics)]
#[metrics(scope = "bandwidth")]
pub struct BandwidthMetrics {
inbound: Counter,
outbound: Counter,
}
Seems like how we use metrics, but I have two concerns:
CounterFn
is implemented for AtomicU64
, but how do make sure that's used under the hood by Counter
?the metrics will be separate and should be Gauge
that we then periodically set based on the total bytes read/written.
So all sessions update the bytes read/written in the shared Atomic and metrics are updated with these numbers:
current throughput:
updated = (total_bytes_now - total_bytes_last)/interval
Gotcha, that can be handled separately in #140 then (thanks for referencing that!)
Would we need a separate Gauge
for each stream we want to meter (as we will have separate BandwithMeter
s for each stream), or would this all bubble up to a top-level Gauge
that meters the throughput across the whole node?
If it's separate Gauge
s, we need a way to describe()
each instance differently, right
Either way, drafting an approach in #726
Also assigning @onbjerg to work on this with you since he was going to pick this up early next week once done with the PMing scoping.
Describe the feature
In order to provide bandwith metrics we need to monitor all incoming and outgoing packets, something like
it's probably the easiest way to instrument the TcpStream, or rather over AsyncRead and AsyncWrite bounds and simply increment the counters
Additional context
No response