near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.33k stars 628 forks source link

Get rid of NetworkMetrics #6704

Closed mina86 closed 2 years ago

mina86 commented 2 years ago

We’re currently using NetworkMetrics structure which holds a broadcast_messages: IntCounterVec. This should be replaced with a pub static BROADCAST_MESSAGES: Lazy<IntCounterVer> just like we do with all other metrics. This would get rid of the NetworkMetrics structure and do_create_int_counter_vec function while simplifying some of the code and bringing in consistency.

The one complication is that the metric is read through in repeated_announce_accounts test. The way that read happens needs to be changed. Perhaps read directly from the static? Or change GetMetrics message to return a hash map or accept name of the metric and return only that.

mm-near commented 2 years ago

The test must not depend on the static value - as rust can run multiple tests in parallel, which will cause the test flakiness.

That's also the reason why I've added the broadcast messages to this class (and pass the instance around) rather than creating a static object.

mina86 commented 2 years ago

Seems like the proper way to implement this would be to mock network and count what is being sent over it.

But I think at least we should have a static Lazy metric and NetworkMetrics have a HashMap with AtomicInts which is compiled only when #[cfg(feature = "test_features")]. This would get rid of do_create_int_counter_vec which is only used in NetworkMetrics and change GetMetrics request to return a more convenient HashMap rather than a metric.

abhijeetbhagat commented 2 years ago

i can work on this. are you saying that NetworkMetrics should have a hashmap?

mina86 commented 2 years ago

This is a bit involved issue and a few things need to happen:

Firstly, the broadcast_messages: IntCounterVec defined in NetworkMetrics structs needs to be converted to a pub static similar to all other metrics in metrics.rs file. With that, inc_broadcast needs to be modified to increment that static metric rather than broadcast_messages field.

Secondly, a broadcast_messages: HashMap<&'static str, usize> needs to be added to the struct. This field should only exist if test_features feature is enabled. inc_broadcast needs to increment value in that hash map.

Lastly, GetMetrics request in test_utils.rs needs to be changed. Currently we have a single test that uses it so the simplest way could be to change it to take type: &'static str argument and then return a single usize value. Maybe also rename it from GetMetrics to something like GetBroadcastMessageCount.

To make things work with &'static str you’ll need to change PeerMessage::msg_variant to return &'static str of course. This can be done by adding strum::IntoStaticStr derive to PeerMessage and then using into instead of as_ref inside of the function.

abhijeetbhagat commented 2 years ago

Ok, on it.

mina86 commented 2 years ago

I was looking into feasibility of this and ended up just implementing the last paragraph (i.e. conversion to IntoStaticStr). So just develop changes on top of #6741

abhijeetbhagat commented 2 years ago

I was looking into feasibility of this and ended up just implementing the last paragraph (i.e. conversion to IntoStaticStr). So just develop changes on top of #6741

yes, saw your changes related to the last paragraph. will add the suggested changes on top of them.

abhijeetbhagat commented 2 years ago

@mina86 wanted clarity on the last requirement related to changing the test case. Should the test case send the new GetBroadcastMessageCount struct or should it send a &'static str - "SyncRoutingTable"? Should the GetBroadcastMessageCount handler in PeerManagerActor change to accept a &'static str instead of a GetBroadcastMessageCount or should a new &'static str handler be added? With a global BROADCAST_MESSAGES instance and the new broadcast_messages hashmap in NetworkMetrics in place, which one of these two should be use to get the count from?

mina86 commented 2 years ago

The handler should use the hashmap. The hashmap should only if test_features cargo feature is defined.

I don’t have strong feelings as to what the message would actually be. It could be GetBroadcastMessageCount with no arguments (just like GetMetrics is now) and then the return type would be HashMap<&'static str> which would be copy of the hash map from NetworkMetrics. Alternatively, it could be GetBroadcastMessageCount { type: &'static str } and then the return type would just be u64. The test would then either send GetBroadcastMessageCount or GetBroadcastMessageCount { type: "SyncRoutingTable" }. The former option (with an empty GetBroadcastMessageCount) is somewhat simpler at this point since you just need to rename the GetMetrics struct and change its return type. The latter option is perhaps more optimised but it doesn’t really matter for tests.

One thing I didn’t mention is that we should also move Arc around. Currently structs which hold NetworkMetrics hold an Arc<NetworkMetrics>. It’s better if they just have network_metrics: NetworkMetrics and the struct would have broadcast_messages: Arc<HashMap<&'static str, u64>>. If they are issues with synchronisation the hash map might need to be wrapped in a Mutex.