paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.66k stars 594 forks source link

Implement Bitswap protocol using notifications protocol #524

Open altonen opened 1 year ago

altonen commented 1 year ago

Bitswap protocol cannot be served over libp2p's request response protocol because it is not compatible with how go-ipfs and js-ipfs work. They seem to expect the response to be received in a new substream.

After paritytech/polkadot-sdk#556 has been implemented, rewrite the Bitswap protocol using notifications protocol and use the Bitswap query result as the handshake of the stream as is done in ipfs-embed's implementation

Related to paritytech/polkadot-sdk#542

altonen commented 1 year ago

Some relevant discussion in the libp2p repository: https://github.com/libp2p/rust-libp2p/issues/2632

Before starting the work on this, we probably need to reevaluate if this is something we absolutely want to support.

sinkingsugar commented 1 year ago

Well, The Fragnova Network is using substrate also because of the --ipfs-server feature, essential for picking up blocks data fast using ipfs as cache layer and bitswap as binary transfer (rather than hexadecimals all over).

I don't think that this is a bitswap problem on its own, sounds more like a generic libp2p incompatibility here. bitswap is just a protocol on top of it. The discussion seems heavily against the protocol which is what it is but it's still working and used.

burdges commented 1 year ago

If I understand, bitswap is basically bittorrent over libp2p TCP connections. It avoids some legacy bitorrent cruft, but also stupidly misses major bittorrent advancements like uTP adopting UDP.

At the same time uTP exists for LEDBAT aka lower bittorrent data priority, not necessarily our goal. We might however want lower priority for transporting blocks and parablock chunks however since we've some quite sensitive messages which should take higher priority.

We could run a race between some open source based upon libtorrent that usess uTP and a stripped down client based upon libp2p's bitswap. It'll be funny when libtorrent smokes libp2p's bitswap despite being handicapped by LEDBAT.

I've no idea if we want this short- to mid-term but long-term one really crazy idea goes..

We should eventually figure out if outgoing UDP packets fair better under DoS attacks. If so then we could maybe send approval announcements over UDP connections, which we'd open between all validators using the existing TCP key exchange, except then UDP messages sticks out. I'd become more interesting if we exchange blocks and parablock chunks using uTP, and approval announcements then disguise themselves as requests or whatever.

At the extreme, we've fairly serious criticism of libp2p by @tomaka and others, but if one did a clean rust bittorrent crate including uTP, which achieved libtorrent like performance (good luck lol), then one could consider designing a p2p layer for substrate and polkadot that more closely matched and exploited the primitives in bittorrent, and ripping out libp2p. At least we should fund someone doing a rust bittorrent crates which pays close attention to the libtorrent code.

I've not answered your question here, but maybe this provides some perspective, and not a derail. Anyways, if libp2p abstraction break bitswap then maybe those abstractions suck and one needs more holes punched through them.

tomaka commented 1 year ago

Anyways, if libp2p abstraction break bitswap then maybe those abstractions suck and one needs more holes punched through them.

The abstractions provided by Substrate (notifications and request-responses) have been designed the way they are because they can be made DoS-resilient (they are not really right now in Substrate (unless it has changed recently but I don't think so), but it's an implementation issue, and the protocol isn't flawed).

The bitswap protocol is not DoS-resilient, and thus can't be implemented using the abstractions that Substrate has at the moment. At least that's how it was the last time I checked out the situation. I'm not really familiar with what people are talking about in the libp2p issue that has been linked.

But I'd argue that the solution here isn't to allow implementing non-DoS-resilient protocols.

sinkingsugar commented 1 year ago

Anyways, if libp2p abstraction break bitswap then maybe those abstractions suck and one needs more holes punched through them.

The abstractions provided by Substrate (notifications and request-responses) have been designed the way they are because they can be made DoS-resilient (they are not really right now in Substrate (unless it has changed recently but I don't think so), but it's an implementation issue, and the protocol isn't flawed).

The bitswap protocol is not DoS-resilient, and thus can't be implemented using the abstractions that Substrate has at the moment. At least that's how it was the last time I checked out the situation. I'm not really familiar with what people are talking about in the libp2p issue that has been linked.

But I'd argue that the solution here isn't to allow implementing non-DoS-resilient protocols.

No problem with that and good points. But that's why it is behind the --ipfs-server flag though. Also DoS resistance should be probably deployed at another level (Substrate doesn't need to do everything), especially RPC nodes should be behind LoadBalancers if deployed in production and to the mass I'd argue.

tomaka commented 1 year ago

But that's why it is behind the --ipfs-server flag though.

No problem with that, but then this flag should be renamed something like "--ipfs-server-note-that-a-malicious-peer-could-crash-your-node", as you can't ask people to be magically aware that using a certain flag is dangerous.

especially RPC nodes should be behind LoadBalancers if deployed in production and to the mass I'd argue.

There's no load balancer capable of handling libp2p/Substrate traffic. Also, you can't ask a load balancer to be DoS resilient while handling a protocol that can't be made DoS resilient. They're not magic.

sinkingsugar commented 1 year ago

But that's why it is behind the --ipfs-server flag though.

No problem with that, but then this flag should be renamed something like "--ipfs-server-note-that-a-malicious-peer-could-crash-your-node", as you can't ask people to be magically aware that using a certain flag is dangerous.

especially RPC nodes should be behind LoadBalancers if deployed in production and to the mass I'd argue.

There's no load balancer capable of handling libp2p/Substrate traffic. Also, you can't ask a load balancer to be DoS resilient while handling a protocol that can't be made DoS resilient. They're not magic.

Oh yeah but gossipsub has its ways to resist. I was just referring about http and ws endpoints not libp2p endpoints. You are right about bitswap, but well, then identity, multistream and whatnot are also fairly open to abuse if done right...

Btw I've seen setups with gossip port going thru nginx (stream) and what not, not sure how well.

burdges commented 10 months ago

I donno how we'd use bitswap, but if we want to see how fast availability can go, then we could try libtorrent on a testnet sometime. It'd be a lot of work to do this securely, but maybe interesting regardless.

Anyways I asked this: https://security.stackexchange.com/questions/271968/hardened-bittorrent-dos-vulnerabilities

altonen commented 10 months ago

I agree that it would be interesting to try libtorrent and see what kind of improvements we'd get out of it. Hopefully some time next year.

tomaka commented 10 months ago

Anyways I asked this: https://security.stackexchange.com/questions/271968/hardened-bittorrent-dos-vulnerabilities

I don't understand the purpose of your question. To me, the entire point of using bitswap or bittorent is compatibility with existing IPFS/Bittorent clients. If compatibility with existing software isn't required, then we're much better off using our own protocol rather than modify something existing.

burdges commented 10 months ago

we're much better off using our own protocol rather than modify something existing.

Arvid Nordberg's libtorrent is famous for being extremely fast. libp2p has a fairly poor reputation, no?

I didn't say we should switch, or even that we do this securely, or that doing this should be high priority, but doing this insecurely for a benchmark is a reasonable experiment that sets our performance goals.

My question was about how hard it is to do this securely, not immediately relevant. but interesting.

tomaka commented 10 months ago

libp2p has a fairly poor reputation, no?

That's a complicated question. We've had performance issues in the Substrate networking but that were due to Substrate-specific issues. We've had performance issues due to TCP slow start, which indeed could be solved by uTP, but it's not fair to attribute this to libp2p itself. There's an unnecessary round trip in the libp2p protocol when using TCP+Noise, but it's not there when using WebRTC and QUIC.

In terms of software, I feel like introducing a completely new library in the tech stack adds a lot of complexity (which Substrate is already packed with), and while it works as a short term hack in the long term it would be a better idea to instead improve libp2p.

altonen commented 10 months ago

We've had quite a bit of problems lately with libp2p. Both in terms of CPU usage which hinder the ability to scale the number of validators and parachains and also with these libp2p upgrades. Firstly they take a lot of effort to make, review and test and I feel like it's for little benefit because libp2p more or less works but if we don't keep up with their release schedule, the next upgrade we have to do is very painful.

In fact, latest release broke both Kademlia and syncing for us and we had to revert and probably have to get back upgrading sooner or later again. This is a lot of work that I would rather allocate somewhere else, such as even testing how well libtorrent would work, even if we find it unsuitable for whatever reason.

I agree with the point that adding a new library will add complexity but if in testing we find it so good that it would be stupid not to use it, I think we're able to manage the complexity of the additional library with abstractions. Not easy, but doable.

tomaka commented 10 months ago

This is a lot of work that I would rather allocate somewhere else, such as even testing how well libtorrent would work

I agree with all your criticism of libp2p, but what I'm saying is that testing libtorrent is not an alternative to the work on libp2p, it comes in addition to the work you need to do on libp2p anyway. Unless you're thinking about replacing the entire networking stack with libtorrent, which is not realistic.

altonen commented 10 months ago

I'm not suggesting replacing the entire networking stack with libtorrent, that's probably not even possible in practice.

What I am suggesting is that we could introduce libtorrent as an additional transport to the other transport that we have right now and exposing that transport as a subset of protocols to the upper-layer protocols of Polkadot. This additional transport and whatever protocols it supports could still be advertised over libp2p-identify, allowing, for example, two validators to discover that they use libtorrent and switch to use that transport instead of using notifications/request-response. This is very high-level idea and I'm sure there are a ton of implementation details that prove out to be difficult implement but I don't see any fundamental problem with the approach.

burdges commented 10 months ago

@altonen We never need protocol negotiation between validators afaik. We make them upgrade their host or kick them from the validator set.

Imho there is considerable work in using libtorrent securely in availability, at which I hint in my SE question. You need (1) some handle over its network layer encryption, (2) a representation of the torrent compatible with the candidate receipt, including (3) the chunks being files with appropriate prioritization for either your own chunk in availability or the systematic chunks in approvals, and (4) a local replacement for the tracker which builds the tracker data from availability bitfields plus backing & approvals statements. Also (4) breaks real bittorrent compatibility, so you've not added functionality, just replaces the availability transport.

We'd only need some hacky (3) if we only ran this for performance tests though. We'd learn something important: If libtorrent offers no improvement, then we should stop blaming networking and look elsewhere for performance gains. If libtorrent provides some improvement, like maybe less CPU even if LEDBAT adds latency, then we've still learned something about our networking code, and possible solutions.

tomaka commented 10 months ago

I agree that rust-libp2p has many drawbacks (some of which being that nobody at Parity seems to be familiar with its internals anymore (and I'm not either anymore fwiw)), but libp2p the protocol is IMO completely fine.

If we're going the direction of getting rid of rust-libp2p, another possibility is to use the libp2p implementation of smoldot (which I wrote from scratch). It would require a ton of internal Substrate refactoring, but so would libtorrent. It is in my opinion much much better than rust-libp2p in many ways.

Of course I'm not completely 100% happy about letting Substrate use the same code as smoldot, because one of the main objectives of smoldot is to be a different implementation than Substrate. If Substrate and smoldot share too much code, it goes against this objective (right now the only code they share is wasmi, wasmtime, serde, cryptographic primitives, and small insubstantial crates).

altonen commented 10 months ago

@burdges Yes good point

I think the problem is that to even get a hacky version of (3) working would require cooperation from two different teams and in the best case where it actually improves things enough to start actually considering it, the implementation has to be thrown away and implemented properly if it was actually a hackish implementation. I think the networking team can provide an API for libtorrent some time in the near(ish) future. After that it's up to the people working availability distribution/recovery to try out the protocol and see how it performs.

@tomaka For the past three months I've been working on a libp2p replacement and it's at a point where I can release the first real version of it. The API is very different from rust-libp2p and as such would be relatively easy to get it working in Substrate since the APIs of the library were inspired by the long-term refactoring goals we have for sc-network and any upper-layer protocol using sc-network. We still want to have libp2p as well so the actual networking stack of Substrate will be offered in a manner similar to parity-db/rocksdb. This means there will be a small set of interfaces the network backend must implement and if smoldot's network stack can implement these interfaces too, it can be ported to Substrate as well.

zdave-parity commented 7 months ago

FWIW I fixed the bitswap implementation on a branch here: https://github.com/zdave-parity/substrate/tree/polkadot-bulletin-chain

I rewrote it as a pretty simple custom NetworkBehaviour. Re the DoS concerns it's not clear to me exactly what those are, but the implementation at least has a reasonable bound on the memory consumed per connection so should not be abusable that way.