paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK
https://polkadot.network/
1.78k stars 638 forks source link

libp2p request-response compatibility #542

Open arkpar opened 3 years ago

arkpar commented 3 years ago

Request-response protocol seems to be incompatible with js-ipfs and go-ipfs implementation on the mux or multistream-select level. I've added a bitswap request-response protocol in substrate. Connected substrate node to a js-ipfs node and ran jsipfs cat Substrate receives the request and my code sends out response with send_response and receives ResponseSent event. JS daemon however never receives the bitswap message. I can see in the logs that the message is received by mplex layer but not reported to the bitswap protocol. Furthermore there are multistream-select error in the log, related to missing newline at the end of the message.

Here's the code (a-ipfs-client branch): https://github.com/paritytech/substrate/blob/a-ipfs-client/client/network/src/bitswap_handler.rs

js daemon log shows that the message has been received, e.g

2021-01-17T15:17:49.026Z libp2p:mplex incoming message {
  id: 2,
  type: 'MESSAGE_RECEIVER',
  data: <Buffer 1a 15 0a 06 01 55 a0 e4 02 20 12 0b 28 04 03 00 0b 10 e4 55 03 75 01>
}

But then there are errors produced at the multistream-select level, suggesting there's an issue with multistream implementation or decoding at the underlying protocol. js-ipfs:

2021-01-17T17:24:13.294Z libp2p:upgrader:error Error: missing newline
    at Object.exports.read (/usr/lib/node_modules/ipfs/node_modules/multistream-select/src/multistream.js:37:19)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async module.exports (/usr/lib/node_modules/ipfs/node_modules/multistream-select/src/handle.js:14:23)
    at async Mplex.onStream (/usr/lib/node_modules/ipfs/node_modules/libp2p/src/upgrader.js:235:42)

go-ipfs:

2021-01-17T16:32:24.995+0100    ^[[35mDEBUG^[[0m        basichost       protocol mux failed: message did not have trailing newline (took 32.524µs)

To reproduce:

  1. Run substrate -d /tmp/sc --storage-chain and let it sync a few blocks. IPFS cid for each transaction will be printed. Note a single cid.
  2. Run substrate -d /tmp/sc --storage-chain --reserved-only --reserved-nodes=/ip4/127.0.0.1/tcp/4001/p2p/12D3KooWDcdGyERXDc2f43Q2ATo3o2DBNygBQtZRjHbpfjqnQKX1 With js-ipfs node address as reserved node.
  3. Start js-ipfs or go-ipfs daemon and run jsipfs cat <cid> or ipfs cat <cid>
  4. substrate log should show bitswap request and response containing some payload.
  5. cat command is stuck

@mxinden @tomaka

mxinden commented 3 years ago

Substrate receives the request and my code sends out response with send_response and receives ResponseSent event.

This would imply that the multistream select process finished successfully, thus I am surprised that the JS side fails to negotiate.

Just to make sure this is not related to V1Lazy, would you mind applying the following diff to your branch?

diff --git a/client/network/src/service.rs b/client/network/src/service.rs
index 86c027e9e..62ea6e6f6 100644
--- a/client/network/src/service.rs
+++ b/client/network/src/service.rs
@@ -319,7 +319,7 @@ impl<B: BlockT + 'static, H: ExHashT> NetworkWorker<B, H> {
                                        .with_max_established_per_peer(Some(crate::MAX_CONNECTIONS_PER_PEER as u32))
                                        .with_max_established_incoming(Some(crate::MAX_CONNECTIONS_ESTABLISHED_INCOMING))
                                )
-                               .substream_upgrade_protocol_override(upgrade::Version::V1Lazy)
+                               // .substream_upgrade_protocol_override(upgrade::Version::V1Lazy)
                                .notify_handler_buffer_size(NonZeroUsize::new(32).expect("32 != 0; qed"))
                                .connection_event_buffer_size(1024);
                        if let Some(spawner) = params.executor {
diff --git a/client/network/src/transport.rs b/client/network/src/transport.rs
index 4d9d4fbde..2996b328c 100644
--- a/client/network/src/transport.rs
+++ b/client/network/src/transport.rs
@@ -101,7 +101,7 @@ pub fn build_transport(
                core::upgrade::SelectUpgrade::new(yamux_config, mplex_config)
        };

-       let transport = transport.upgrade(upgrade::Version::V1Lazy)
+       let transport = transport.upgrade(upgrade::Version::V1)
                .authenticate(authentication_config)
                .multiplex(multiplexing_config)
                .timeout(Duration::from_secs(20))
arkpar commented 3 years ago

With the change above multistream-select errors are gone, but the bitswap message still does not come through on the js side. Now there's just this in the log: js:

2021-01-17T18:06:50.850Z libp2p:mplex incoming message { id: 1, type: 'MESSAGE_RECEIVER', data: <Buffer 17> }
2021-01-17T18:06:50.850Z libp2p:mplex incoming message {
  id: 1,
  type: 'MESSAGE_RECEIVER',
  data: <Buffer 1a 15 0a 06 01 55 a0 e4 02 20 12 0b 28 04 03 00 0b 10 e4 55 03 75 01>
}
2021-01-17T18:06:50.850Z libp2p:mplex incoming message { id: 1, type: 'CLOSE_RECEIVER', data: <Buffer > }
2021-01-17T18:06:50.850Z libp2p:mplex:stream initiator stream 1 source end undefined
2021-01-17T18:06:50.850Z libp2p:mplex initiator stream 1 1 ended

go:

protocol mux failed: stream reset
arkpar commented 3 years ago

Looks like during multistream negotiations substrate sends /multistream/1.0.0 followed by the bitswap message. While other js nodes send /multistream/1.0.0 followed by /ipfs/bitswap/1.2.0, followed by the bitswap message.

arkpar commented 3 years ago

@mxinden Any ideas?

tomaka commented 3 years ago

Looks like during multistream negotiations substrate sends /multistream/1.0.0 followed by the bitswap message.

I would be extremely surprised if that was the case, as nothing would work if we didn't send the protocol name.

arkpar commented 3 years ago

Looks like during multistream negotiations substrate sends /multistream/1.0.0 followed by the bitswap message.

I would be extremely surprised if that was the case, as nothing would work if we didn't send the protocol name.

Right, that was from an unrelated event. Instead it looks like there's no negotiation performed at all when sending a response.

What's supposed to happens when remote closes the stream right after sending a request? I'd expect a new substream to be opened for the response and protocol upgrade to be performed. Instead could it be the response is sent over a closed stream or over a new stream without the protocol update?

tomaka commented 3 years ago

Closing a substream is unidirectional. When you close a substream, you only close the writing side (similar to closing a TCP socket for example).

The normal flow of operation is: the remote sends the multistream-select handshake, protocol name, its request, closes its side (or not, it's not mandatory), then the remote sends back the multistream-select handle, protocol name, and response, then closes its side.

It's not impossible that there's a bug in libp2p-mplex, as it got refactored/almost-rewritten a few months ago.

arkpar commented 3 years ago

Ok, then request-response can't be used for implementing bitswap because it requires opening a new stream for each response. Or at least that's what go and js implementations expect. That's a subtle requirement that's not really documented anywhere.