paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.84k stars 668 forks source link

Move `sc-network-gossip` into Grandpa #28

Open bkchr opened 1 year ago

bkchr commented 1 year ago

Grandpa is currently the only user of sc-network-gossip. Polkadot started using their own gossip implementation from the beginning. We also know that sc-network-gossip has certain bugs (like sending duplicate messages etc). We should move sc-network-gossip to Grandpa to remove the assumptions that the current implementation is reusable in different projects. While moving it to Grandpa we should fix the issues or maybe rewrite it completely. In the future we can maybe bring back some generic gossip implementation.

bkchr commented 1 year ago

Beefy is also using it right now, but this shouldn't be such a big problem.

kayabaNerve commented 1 year ago

I also use it as a consumer of Substrate. What would be the suggested migration path?

While I'm frustrated to now hear it's buggy, and am disappointed that wasn't documented, I'd rather it be fixed than moved internally. I'd especially rather it be fixed in-place if part of the discussion on moving it internally is fixing it/rewriting it.

My other question is are these bugs currently documented? I have not found stability issues once getting it working in the first place, though that was a pain... I also question how severe the bugs can be when it's still used currently in Grandpa.

bkchr commented 1 year ago

What would be the suggested migration path?

You could fork the code.

My other question is are these bugs currently documented?

I don't know for sure. I also don't know exactly what are the issues, there are just assumptions around stuff not working properly, but AFAIK no one really looked into it.

kayabaNerve commented 1 year ago

I guess. While I fully ack Parity's ability to deprecate it, I'd still like to speak here as a counterpoint to deprecating it.

... that's... frustrating. I do understand how it can be difficult to identify exact issues in complex systems though, especially ones not so obvious. I didn't notice any in the issue tracker, and at a high level, haven't noticed any issues in usage. While I have noticed some peer disconnections over time, presumably due to downscoring, I wrote that off as minor latency issues causing a cumulative score failure. Not anything actually problematic, though that does potentially point to some underlying problem (just one hard to debug, as I acknowledge such issues likely are).

I won't bother to continue harping. I want to provide a counter, as a consumer, not be annoying to the people who build what I use. If there are further discussions on sc-network-gossip, I'd love to chime in, and if anyone knows of issues/complaints, even informally, I'd love the heads up.

burdges commented 1 year ago

There are several different security goals for a gossip protocol, along with different techniques by which one might achieve them, ala our politeness logic strewn throughout polkadot, or the erlay set-reconciliation protocol in bitcoin's memepool.

We do think substrate should provide some gossip layer that clearly says what it achieves, vs what it does not achieve. It's hard to do this in 2023 though, given our own gossip mess, even in polkadot.

In other words, you won't necessarily give up (much) upstream maintenance support by forking this code because anything significantly altering this code might change its goals somewhat too.