stacks-network / stacks-core

The Stacks blockchain implementation
https://docs.stacks.co
GNU General Public License v3.0
3.01k stars 671 forks source link

[Network] StackerDB decoherence #5193

Closed jcnelson closed 1 month ago

jcnelson commented 2 months ago

For reasons that are not yet clear, Nakamoto testnet and mainnet StackerDB replicas will eventually lose coherence. Writes to one replica do not find their way to others -- neither via push, nor via sync. This needs investigation, and may be partially fixed by #5191.

diwakergupta commented 2 months ago

I've observed something on a signer node that might be related. Noting here for tracking, happy to move to a new issue if that's more appropriate. Note that I've already sought input and debugging help from @hstove and @jferrant on this.

The setup:

I'm running the binaries directly, co-located on the same machine. There's also a dedicated bitcoind. This setup has been running for several months at this point, without any problems.

Symptoms:

jcnelson commented 2 months ago

I think I know the reason for this now. The network pruner starts removing new connections after 10 outbound peers have been found (this is the default limit). Network subsystems have a way of "pinning" connections so they won't get pruned while they're in use, but there was a bug in the way the pinning system worked which had a very immediate and noticeable impact on StackerDB (especially since a signer or miner would be running a couple dozen replicas). I'll have a patch out soon, once I'm done testing it.

diwakergupta commented 2 months ago

Based on the draft PR, would a workaround be to increase soft_max_neighbors_per_org -- happy to test that out if that helps.

wileyj commented 1 month ago

closing since #5197 is merged

blockstack-devops commented 3 weeks ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.