stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3.01k stars 664 forks source link

Stackerdb Benchmark testing #4120

Open wileyj opened 9 months ago

wileyj commented 9 months ago

Stackerdb needs a benchmark function to determine how long it takes all nodes to see an update. very similar to how ping/traceroute works- the end result is we have an idea of how long it takes an object to be shared across a network.

stackerdb instance (publish contract with 2 slots)

wileyj commented 9 months ago

@jcnelson @CAGS295

jferrant commented 9 months ago

Before I get deep in the weeds, this seems to be a request to create a new testing binary that implements stacker db solely for the purpose of measuring its time to disseminate messages, yes? I..e a binary that writes a message to stacker db and waits for every other binary that is subscribed/part of the system to receive it and respond back to the originator?

I will see if I can make an even more generic solution that adds a trait to libstackerdb to implement a ping function.

CAGS295 commented 9 months ago

@wileyj I still need to be assigned this task. My involvement in this is not clear to me. Are we both working on this ticket together?

My line of thought follows what @jferrant said. My a priori guess is to implement pinging and a broadcast scan. I need to take a look at libstackerdb first to get more context.

wileyj commented 9 months ago

Before I get deep in the weeds, this seems to be a request to create a new testing binary that implements stacker db solely for the purpose of measuring its time to disseminate messages, yes? I..e a binary that writes a message to stacker db and waits for every other binary that is subscribed/part of the system to receive it and respond back to the originator?

I will see if I can make an even more generic solution that adds a trait to libstackerdb to implement a ping function.

i think the implementation is up to you - the goal is that it's something that we can invoke as needed. what you proposed as a question is the gist of it - a ping tool for stackerdb to see how long it takes for each node to see a change. @CAGS295 pinged you since this might be something you can dig into as part of sending you more blockchain work. If you think you're up to it, i would encourage you to get in touch with @AshtonStephens and @jferrant, but you're more than welcome to take ownership of this if you'd like.

ideally, this is something we want to have functional in the next few weeks (i.e. prior to Jan 2024).

CAGS295 commented 9 months ago

@wileyj What degree of control/coop over the nodes do we want to have for the benchmark? Will we deploy a private network for the benchmark, or would you like to ping public nodes anytime?

If node operators cooperate, we can have them Post to our observer to process responses instead of sending them through stackerDB.

If you want pongs to travel through stackerDB, would that be an all-to-all broadcast (overestimate latency + bandwidth inefficiency)? Is it possible to single out the origin when sending the response chunk?

wileyj commented 9 months ago

@wileyj What degree of control/coop over the nodes do we want to have for the benchmark? Will we deploy a private network for the benchmark, or would you like to ping public nodes anytime?

If node operators cooperate, we can have them Post to our observer to process responses instead of sending them through stackerDB.

If you want pongs to travel through stackerDB, would that be an all-to-all broadcast (overestimate latency + bandwidth inefficiency)? Is it possible to single out the origin when sending the response chunk?

fair questions, i would default to assuming that node operators won't want to post data to centralized server so it should be something only used for testing purposes on our end.

to your other questions, i would defer to @jferrant and others who are more familiar with stackerdb

CAGS295 commented 9 months ago
  • 1 node posts to their slot a random value (waits for everyone to see that value)

Does this mean node operators will subscribe to a 'ping' contract or that 'instrumented contracts' will have a dedicated slot to request pongs? A separate contract would be cleaner but requires operators to cooperate.

After some thought, I have yet to find a way to use stackersDB orthodoxly to broadcast the 'pongs' efficiently. Even if we disregard the bandwidth issue from an all-to-all broadcast (n-pong broadcasts), it can add a lot of variance to the benchmark. Unless we use the 'write-time' upon arrival but then, we would add clock skewness error to the equation.

It makes me want to have the node ops subscribe to a centralized observer, but I noticed from the logs that the node seems to stall if the observer connection is refused, which could introduce a DOS vector; I'm not sure.

Another option I want to explore is if 'sampling' from neighbors/replicas makes it easier, but it does not sound like it will fix anything, making it manageable, perhaps.

CAGS295 commented 9 months ago

Update After the last sync, we decided to add a special ping mode to stacks-signer. Once signers see a 'ping' slot written, instead of dkg, they will write back the wall time diff to a slot.

jcnelson commented 9 months ago

We need to alter the signer binary to have a "benchmark" mode. We want to measure the distribution of round-trip times (RTTs) for signers. To do this, we'll need the following:

To deploy this, the benchmark-runner and each benchmark-follower need to run a Stacks node that is subscribed to the benchmark StackerDB instance. Every so often, the benchmark-runner will send a ping, and then gather the RTTs for each benchmark-follwers' pongs. It will then record these in a way that we can consume later (e.g. a logfile, a prometheus metric, whatever).

A minimum viable testbed would just be to run a single benchmark-runner and a single benchmark-follower, where their respective stacks nodes each treat each other as seed nodes. From there we can set up increasingly elaborate scenarios, such as: