kaspanet / rusty-kaspa

Kaspa full-node and related libraries in the Rust programming language. This is a stable version at the initial rollout phases.
ISC License
376 stars 119 forks source link

Bandwidth Tracking #334

Closed biryukovmaxim closed 7 months ago

biryukovmaxim commented 7 months ago

this PR introduces two tower middlewares and applies them to measure incoming/outgoing traffic based on requests/responses.

aspect commented 7 months ago

tx_bytes and rx_bytes should be separate for each service and named within the context of the service:

This is what we currently have in another branch:

pub struct BandwidthMetrics {
    pub borsh_bytes_tx: u64,
    pub borsh_bytes_rx: u64,
    pub json_bytes_tx: u64,
    pub json_bytes_rx: u64,
    pub grpc_p2p_bytes_tx: u64,
    pub grpc_p2p_bytes_rx: u64,
    pub grpc_user_bytes_tx: u64,
    pub grpc_user_bytes_rx: u64,
}

I suppose we should create a global BandwidthMetricsCounters like this:

pub struct AtomicBandwidthMetrics {
    pub borsh_bytes_tx: Arc<AtomicU64>,
    ...
}

and pass this around as Arc<AtomicBandwidthMetrics> allowing us to clone specific counters when passing them to specific services as (Arc<AtomicU64>,Arc<AtomicU64>).

aspect commented 7 months ago

I guess internal messages are usize, which is why you ended up with AtomicUsize... I am a bit uneasy about this but I guess this is fine. The reason I am uneasy is that in WASM32 AtomicUsize is 32-bit.. but it is better to cast on aggregation than on accumulation... so that's fine. Just bringing that up here for awareness.

aspect commented 7 months ago

The modification of create_core and top-level API functions is not acceptable. Please revert. Any changes to external API should be discussed before made and if made, they should be made without breaking existing API calls. This breaks external repositories/projects that use these functions.

Why are the counters passed from the top? This removes self-containment from the daemon. I understand that fd_limits are passed from the top (even tho technically that should be a part of the Runtime struct, but that is a separate discussion) but tx/rx counters need to be enclosed within the daemon creation fn. I.e. they should be created within create_core_with_runtime() and destroyed when this function exits.

Also changes to (for example)rothschild are quite intrusive. You should avoid changing constructors or API calls. The way this should be done is by creating GrpcClient::connect_with_counters that accepts counters while keeping GrpcClient::connect without any changes and have GrpcClient::connect call connect_with_counters where the default counters are created.

Also, we can keep passing the counters to gRPC and wRPC services so that they can act as a breadcrumb trail when I integrate this, but please remove any changes from .proto and related conversions. We don't want these changes exposed to public RPC until the integration work on my end is complete.

biryukovmaxim commented 7 months ago

Why are the counters passed from the top?

This is the only reason why counters are on top level link

There are two ways how to expose it to there: create on top level or return from create_core function. Both ways are breaking change

aspect commented 7 months ago

Why are the counters passed from the top?

This is the only reason why counters are on top level link

So that merits 2 fns, new_client and new_client_with_bandwith_counters. Rothchild can use new_client without any changes to it.

Do we need counters in the Daemon? I think they should be ignored for now. If they should be present there, then this should be discussed and looked at separately and potentially the entire data structure containing all counters elevated. I don't think this is currently needed. (This is also what the Runtime struct was made for - it was for this type of containment, but I have no bandwidth to examine this right now).

Let's remove counters from integration tests for now.

biryukovmaxim commented 7 months ago

Do we need counters in the Daemon?

Even if we don't have test functionality related to these counters, we should test the same code as regular. Before I changed while pattern in middleware implementation(we discussed in dm about that) only integration testes showed the real issue. That's why it should be there from the beginning. Maybe we can change design of integration tests to fix that

aspect commented 7 months ago

I don't believe your assertion above is correct. Integration tests would have exposed the problem you've encountered without any top-level API changes. It was a performance problem so default counters that are being created internally without any API changes or created at the top-level should not make any difference.

I have originally created the Runtime struct that is meant to carry that type of data. I believe that perhaps this should be looked at as a separate PR where we may want to consider elevating other metrics to the Runtime as well.