some questions about support for WebRTC Direct

xicilion commented 6 months ago

In the WebRTC specification, setLocalDescription accepts the full sessionDescription: https://developer.mozilla.org/en-US/docs/Web/API/RTCPeerConnection/setLocalDescription#syntax

In general, applications do not need to set up sdp specifically, however libp2p is designed to establish connections independent of signaling by modifying ice-ufrag and ice-pwd: https://github.com/libp2p/specs/blob/master/webrtc/webrtc-direct.md#browser-to-public-server

The setLocalDescription in libdatachannel only supports rtc::Description::Type, is there any way to implement a protocol like libp2p WebRTC Direct in libdatachannel?

paullouisageneau commented 6 months ago

This has already been discussed in https://github.com/paullouisageneau/libdatachannel/issues/970#issuecomment-1712471186 about disabling fingerprint validation, which would also be required by for this.

SDP munging between createOffer/createAnswer and setLocalDescription is forbidden in the WebRTC specification, even if Safari, Firefox, and Chrome don't enforce it:

If type is "offer", and sdp is not the empty string and not equal to connection.[[LastCreatedOffer]], then return a promise rejected with a newly created InvalidModificationError and abort these steps.

If type is "answer" or "pranswer", and sdp is not the empty string and not equal to connection.[[LastCreatedAnswer]], then return a promise rejected with a newly created InvalidModificationError and abort these steps.

Therefore, libdatachannel has no mechanism to set a modified local description by design to make the implementation way simpler (as it doesn't need to reparse the SDP and synchronize its internal state).

Not only changing the whole architecture to allow SDP munging hack is out of the question, but it would only allow you to implement "peer A" in the WebRTC Direct spec, since to implement "peer B" you would also need to modify the whole ICE ufrag and password generation and validation process, and you would need a specific ICE hook to create peer connections triggered by incoming STUN probes.

I suggest that WebRTC Direct could be implemented as a specific operating mode of libdatachannel instead, which would not emit a local description nor accept a remote one, but would perform the required ufrag and password operations under the hood and offer an API to listen on incoming peer connections.

xicilion commented 6 months ago

Since we're thinking about supporting WebRTC Direct, wouldn't it be a good idea to consider specifying ice_ufrag and ice_pwd when creating the PeerConnection. This would probably require juice_create to support it though.

And with the addition of disabling fingerprint authentication, and a stun event, it might be possible to build a somewhat more elegant implementation of WebRTC Direct.

xicilion commented 6 months ago

I made some changes, passed in ice_ufrag and ice_pwd and initiated a connection to go-libp2p-server. Looking at the log the handshake is already successful, but onOpen cannot be triggered. I traced the code and found that DataChannel::incoming is not receiving the MESSAGE_ACK message. So I triggered it when the function received Message::String and Message::Binary.

    case Message::String:
    case Message::Binary:
        if (!mIsOpen.exchange(true)) {
            triggerOpen();
        }

        mRecvQueue.push(message);
        triggerAvailable(mRecvQueue.size());
        break;

I'm not sure if this is a problem with the go-libp2p implementation. Is this an appropriate change?

xicilion commented 6 months ago

I made some modifications to support setting up cert and key, and it's working fine so far. https://github.com/paullouisageneau/libdatachannel/issues/972

paullouisageneau commented 6 months ago

I made some changes, passed in ice_ufrag and ice_pwd and initiated a connection to go-libp2p-server. Looking at the log the handshake is already successful, but onOpen cannot be triggered. I traced the code and found that DataChannel::incoming is not receiving the MESSAGE_ACK message. So I triggered it when the function received Message::String and Message::Binary.
  case Message::String:
  case Message::Binary:
      if (!mIsOpen.exchange(true)) {
          triggerOpen();
      }

      mRecvQueue.push(message);
      triggerAvailable(mRecvQueue.size());
      break;
I'm not sure if this is a problem with the go-libp2p implementation. Is this an appropriate change?

No, this is not normal. If there is no ACK message, it's either a confusion between datachannels opened in-band and negotiated datachannels or a bug in the remote implementation.

xicilion commented 6 months ago

Yes, it's confusing, I connect to libdatachannel's peer node and don't have this problem.

xicilion commented 6 months ago

is a simplified piece of client-side code for WebRTC Direct.

const rtc = require('rtc');

const HANDSHAKE_TIMEOUT_MS = 10000;

async function main() {
    const ufrag = 'libp2p+webrtc+v1/' + 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA';
    const ip = '192.168.65.5';
    const port = 60916;
    var answerSdp = {
        "type": "answer",
        "sdp": `v=0\r\no=- 0 0 IN IP4 ${ip}\r\ns=-\r\nc=IN IP4 ${ip}\r\nt=0 0\r\na=ice-lite\r\nm=application ${port} UDP/DTLS/SCTP webrtc-datachannel\r\na=mid:0\r\na=setup:active\r\na=ice-ufrag:${ufrag}\r\na=ice-pwd:${ufrag}\r\na=fingerprint:SHA-256 07:E5:6F:2A:1A:0C:2C:32:0E:C1:C3:9C:34:5A:78:4E:A5:8B:32:05:D1:57:D6:F4:E7:02:41:12:E6:01:C6:8F\r\na=sctp-port:5000\r\na=max-message-size:16384\r\na=candidate:1467250027 1 UDP 1467250027 ${ip} ${port} typ host\r\n`
    };

    console.log(answerSdp);

    const peerConnection = new rtc.RTCPeerConnection({
        iceUfrag: ufrag,
        icePwd: ufrag,
        port: 12345,
        maxMessageSize: 16384
    });

    const dataChannelOpenPromise = new Promise((resolve, reject) => {
        console.log("peerConnection.createDataChannel");
        const handshakeDataChannel = peerConnection.createDataChannel('');
        const handshakeTimeout = setTimeout(() => {
            const error = `Data channel was never opened: state: ${handshakeDataChannel.readyState}`;
            reject(new Error(error));
        }, HANDSHAKE_TIMEOUT_MS);

        handshakeDataChannel.onopen = (_) => {
            console.log('Data channel opened');
            clearTimeout(handshakeTimeout);
            resolve(handshakeDataChannel);
        };

        handshakeDataChannel.onerror = (event) => {
            console.error('Data channel error:', event);
            clearTimeout(handshakeTimeout);
            const error = `Error opening a data channel for handshaking.`;
            reject(new Error(error));
        };

        handshakeDataChannel.onmessage = (event) => {
            console.log('Data channel message:', event.data);
        };
    });

    await peerConnection.setRemoteDescription(answerSdp);
    console.log("setRemoteDescription ok");
    const handshakeDataChannel = await dataChannelOpenPromise;
    console.log("connect ok");
}

main();

xicilion commented 6 months ago

Today, based on the implementation of the skipCheckFingerprint option, it is possible to implement a simple Peer B. All that is left is the stun reflection mechanism, which will allow a more complete Peer B to be implemented.

The code for Peer B looks like this.

var rtc = require('rtc');

const key_pem =
    `-----BEGIN PRIVATE KEY-----
MIGHAgEAMBMGByqGSM49AgEGCCqGSM49AwEHBG0wawIBAQQg3bbuT2SjSlMZH/J1
vHwmF0Blb/DBc/v7f1Za9GPUXHmhRANCAATDpmYxZozjVw6xlERNjJJGgfY3bEmj
xAKFRq3nbxbDHvMEs34u9HntMZWJ0hp3GUC+Ax7JHTv3cYqSaAg2SpR4
-----END PRIVATE KEY-----`

const cert_pem =
    `-----BEGIN CERTIFICATE-----
MIIBgjCCASigAwIBAgIJAPMXEoZXOaDEMAoGCCqGSM49BAMCMEoxDzANBgNVBAMM
BmNhLmNvbTELMAkGA1UEBhMCVVMxCzAJBgNVBAcMAkNBMRAwDgYDVQQKDAdleGFt
cGxlMQswCQYDVQQIDAJDQTAeFw0yNDA1MDUxNjAzMjFaFw0yNDA4MTMxNjAzMjFa
MDExCzAJBgNVBAYTAkNOMRAwDgYDVQQKDAdiYW96LmNuMRAwDgYDVQQDDAdiYW96
Lm1lMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEw6ZmMWaM41cOsZRETYySRoH2
N2xJo8QChUat528Wwx7zBLN+LvR57TGVidIadxlAvgMeyR0793GKkmgINkqUeKMQ
MA4wDAYDVR0TAQH/BAIwADAKBggqhkjOPQQDAgNIADBFAiAPNldqGJHryfjPFyX3
zfHHWlO7xSDTzdyoxzroFdwy+gIhAKmZizEVvDlBiIe+3ptCArU3dbp+bzLynTcr
Ma9ayzQy
-----END CERTIFICATE-----`

const ufrag = 'libp2p+webrtc+v1/' + 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA';
const ip = '192.168.65.5';
const port = 12345;
var offerSdp = {
    "type": "offer",
    "sdp": `v=0\r\no=- 0 0 IN IP4 ${ip}\r\ns=-\r\nc=IN IP4 ${ip}\r\nt=0 0\r\na=ice-lite\r\nm=application ${port} UDP/DTLS/SCTP webrtc-datachannel\r\na=mid:0\r\na=setup:passive\r\na=ice-ufrag:${ufrag}\r\na=ice-pwd:${ufrag}\r\na=fingerprint:SHA-256 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00\r\na=sctp-port:5000\r\na=max-message-size:16384\r\na=candidate:1467250027 1 UDP 1467250027 ${ip} ${port} typ host\r\n`
};

let peer1 = new rtc.RTCPeerConnection({
    enableIceUdpMux: true,
    skipCheckFingerprint: true,
    iceUfrag: ufrag,
    icePwd: ufrag,
    certPem: cert_pem,
    keyPem: key_pem,
    port: 60916,
    maxMessageSize: 16384
});

peer1.setRemoteDescription(offerSdp);

console.readLine();

paullouisageneau commented 6 months ago

        const handshakeDataChannel = peerConnection.createDataChannel('');

You shouldn't set an empty label, it is technically allowed but might cause issues with some implementations.

Today, based on the implementation of the skipCheckFingerprint option, it is possible to implement a simple Peer B. All that is left is the stun reflection mechanism, which will allow a more complete Peer B to be implemented.

Great, nice work!

xicilion commented 6 months ago

var rtc = require('rtc');

const key_pem =
    `-----BEGIN PRIVATE KEY-----
MIGHAgEAMBMGByqGSM49AgEGCCqGSM49AwEHBG0wawIBAQQg3bbuT2SjSlMZH/J1
vHwmF0Blb/DBc/v7f1Za9GPUXHmhRANCAATDpmYxZozjVw6xlERNjJJGgfY3bEmj
xAKFRq3nbxbDHvMEs34u9HntMZWJ0hp3GUC+Ax7JHTv3cYqSaAg2SpR4
-----END PRIVATE KEY-----`

const cert_pem =
    `-----BEGIN CERTIFICATE-----
MIIBgjCCASigAwIBAgIJAPMXEoZXOaDEMAoGCCqGSM49BAMCMEoxDzANBgNVBAMM
BmNhLmNvbTELMAkGA1UEBhMCVVMxCzAJBgNVBAcMAkNBMRAwDgYDVQQKDAdleGFt
cGxlMQswCQYDVQQIDAJDQTAeFw0yNDA1MDUxNjAzMjFaFw0yNDA4MTMxNjAzMjFa
MDExCzAJBgNVBAYTAkNOMRAwDgYDVQQKDAdiYW96LmNuMRAwDgYDVQQDDAdiYW96
Lm1lMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEw6ZmMWaM41cOsZRETYySRoH2
N2xJo8QChUat528Wwx7zBLN+LvR57TGVidIadxlAvgMeyR0793GKkmgINkqUeKMQ
MA4wDAYDVR0TAQH/BAIwADAKBggqhkjOPQQDAgNIADBFAiAPNldqGJHryfjPFyX3
zfHHWlO7xSDTzdyoxzroFdwy+gIhAKmZizEVvDlBiIe+3ptCArU3dbp+bzLynTcr
Ma9ayzQy
-----END CERTIFICATE-----`

let accepter = new rtc.RTCPeerConnection({
    enableIceUdpMux: true,
    stunBinding: true,
    port: 60916
});

accepter.onstunbinding = function (binding) {
    console.log('onstunbinding', binding);

    const ufrag = binding.ufrag;
    const [ip, port] = binding.address.split(':');

    let peer1 = new rtc.RTCPeerConnection({
        enableIceUdpMux: true,
        skipCheckFingerprint: true,
        iceUfrag: ufrag,
        icePwd: ufrag,
        certPem: cert_pem,
        keyPem: key_pem,
        maxMessageSize: 16384
    });

    peer1.setRemoteDescription({
        "type": "offer",
        "sdp": `v=0\r\no=- 0 0 IN IP4 ${ip}\r\ns=-\r\nc=IN IP4 ${ip}\r\nt=0 0\r\na=ice-lite\r\nm=application ${port} UDP/DTLS/SCTP webrtc-datachannel\r\na=mid:0\r\na=setup:passive\r\na=ice-ufrag:${ufrag}\r\na=ice-pwd:${ufrag}\r\na=fingerprint:SHA-256 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00\r\na=sctp-port:5000\r\na=max-message-size:16384\r\na=candidate:1467250027 1 UDP 1467250027 ${ip} ${port} typ host\r\n`
    });
}

accepter.setRemoteDescription({
    "type": "offer",
    "sdp": `v=0\r\no=- 0 0 IN IP4 00.0.0\r\ns=-\r\nc=IN IP4 00.0.0\r\nt=0 0\r\na=ice-lite\r\nm=application 1 UDP/DTLS/SCTP webrtc-datachannel\r\na=mid:0\r\na=setup:passive\r\na=ice-ufrag:1\r\na=ice-pwd:1\r\na=fingerprint:SHA-256 00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00\r\na=sctp-port:5000\r\na=max-message-size:16384\r\na=candidate:1467250027 1 UDP 1467250027 00.0.0 1 typ host\r\n`
});

console.readLine();

After adding the stunbinding event, the full version of Peer B has established a connection. There is still an issue where the second connection handshake fails if there is a short time between connections, still working on it.

It's a bit of a shame that the implementation of the stunbinding event doesn't look very good.

xicilion commented 6 months ago

I found it. It was an issue with my test code, where I was testing with Peer A bound to a port, causing Peer B to treat the same address and port as the same connection.

I thought there would be connection confusion here when udp is multiplexed, but then I figured out that this only happens when udp is multiplexed on both ends. So it's not a problem.

xicilion commented 6 months ago

So far, I've made the following changes:

libjuice
- agent_create support passing in ice_ufrag and ice_pwd in config.
libdatachannel.
- Configuration supports passing in iceUfrag and icePwd.
- Configuration supports passing in certPem and keyPem.
- Configuration supports skipCheckFingerprint.
- Disable comparing local and remote iceUfrag and icePwd in validateRemoteDescription.

I will submit pr for each of these features if you find them acceptable.

Because I don't understand many implementation details, the implementation of reflect stun in libjuice and libdatachannel is too complicated, I'm sure you will have a better design. I don't think I'll submit it.

achingbrain commented 5 months ago

This is very exciting stuff, thanks so much for picking it up @xicilion

I've just got js-libp2p successfully dialling go-libp2p over webrtc-direct from node.js using a patched version of node-datachannel (https://github.com/murat-dogan/node-datachannel/pull/256) that uses libdatachannel with https://github.com/paullouisageneau/libdatachannel/pull/1201 applied and the latest version of libjuice (so it has https://github.com/paullouisageneau/libjuice/pull/243 which doesn't appear to have made it into a release yet).

Haven't quite go the listener end done yet, I get a DTLS error (logs below) but this is some nice progress!

...
2024-06-06 14:12:22.184 DEBUG [112439100] [rtc::impl::DtlsTransport::start@836] Starting DTLS transport
2024-06-06 14:12:22.184 INFO  [112439100] [rtc::impl::IceTransport::LogCallback@385] juice: Changing state to completed
2024-06-06 14:12:22.184 INFO  [112439100] [rtc::impl::PeerConnection::changeIceState@1254] Changed ICE state to completed
2024-06-06 14:12:22.184 DEBUG [112439094] [PeerConnectionWrapper::onIceStateChange@783] onIceStateChange cb received from rtc
2024-06-06 14:12:22.184 DEBUG [112439004] [PeerConnectionWrapper::onIceStateChange@786] mOnIceStateChangeCallback call(1)
2024-06-06 14:12:22.184 DEBUG [112439004] [PeerConnectionWrapper::onIceStateChange@795] mOnIceStateChangeCallback call(2)
2024-06-06 14:12:22.235 INFO  [112439100] [rtc::impl::IceTransport::LogCallback@385] juice: Changing state to connected
2024-06-06 14:12:22.235 INFO  [112439100] [rtc::impl::PeerConnection::changeIceState@1254] Changed ICE state to connected
2024-06-06 14:12:22.236 DEBUG [112439100] [rtc::impl::DtlsTransport::DtlsTransport@733] Initializing DTLS transport (OpenSSL)
2024-06-06 14:12:22.236 DEBUG [112439093] [PeerConnectionWrapper::onIceStateChange@783] onIceStateChange cb received from rtc
2024-06-06 14:12:22.236 DEBUG [112439004] [PeerConnectionWrapper::onIceStateChange@786] mOnIceStateChangeCallback call(1)
2024-06-06 14:12:22.236 DEBUG [112439004] [PeerConnectionWrapper::onIceStateChange@795] mOnIceStateChangeCallback call(2)
2024-06-06 14:12:22.236 DEBUG [112439100] [rtc::impl::DtlsTransport::start@836] Starting DTLS transport
2024-06-06 14:12:22.236 INFO  [112439100] [rtc::impl::IceTransport::LogCallback@385] juice: Changing state to completed
2024-06-06 14:12:22.236 INFO  [112439100] [rtc::impl::PeerConnection::changeIceState@1254] Changed ICE state to completed
2024-06-06 14:12:22.236 DEBUG [112439095] [PeerConnectionWrapper::onIceStateChange@783] onIceStateChange cb received from rtc
2024-06-06 14:12:22.236 DEBUG [112439004] [PeerConnectionWrapper::onIceStateChange@786] mOnIceStateChangeCallback call(1)
2024-06-06 14:12:22.236 DEBUG [112439004] [PeerConnectionWrapper::onIceStateChange@795] mOnIceStateChangeCallback call(2)
2024-06-06 14:12:22.236 ERROR [112439092] [rtc::impl::DtlsTransport::InfoCallback@1047] DTLS alert: unexpected_message
2024-06-06 14:12:22.236 ERROR [112439087] [rtc::impl::DtlsTransport::InfoCallback@1047] DTLS alert: unexpected_message
2024-06-06 14:12:22.238 ERROR [112439092] [rtc::impl::DtlsTransport::doRecv@987] DTLS recv: Handshake failed: error:0A0000F4:SSL routines::unexpected message
2024-06-06 14:12:22.238 ERROR [112439087] [rtc::impl::DtlsTransport::doRecv@987] DTLS recv: Handshake failed: error:0A0003F2:SSL routines::ssl/tls alert unexpected message
2024-06-06 14:12:22.238 ERROR [112439092] [rtc::impl::DtlsTransport::doRecv@995] DTLS handshake failed
2024-06-06 14:12:22.238 INFO  [112439092] [rtc::impl::PeerConnection::changeState@1236] Changed state to failed
2024-06-06 14:12:22.238 DEBUG [112439092] [PeerConnectionWrapper::onStateChange@737] onStateChange cb received from rtc
2024-06-06 14:12:22.238 ERROR [112439087] [rtc::impl::DtlsTransport::doRecv@995] DTLS handshake failed
2024-06-06 14:12:22.238 INFO  [112439087] [rtc::impl::PeerConnection::changeState@1236] Changed state to failed

achingbrain commented 5 months ago

I get a DTLS error

I think I've figured this out, I had the wrong DTLS roles set in the inferred SDP offers/answers, seems to work now.

xicilion commented 5 months ago

so it has paullouisageneau/libjuice#243 which doesn't appear to have made it into a release yet.

yep, I was waiting for libjuice@243 to be released to implement this feature, glad you released it.

xicilion commented 5 months ago

Haven't quite go the listener end done yet, I get a DTLS error (logs below) but this is some nice progress!

I have a PR at libjuice, but don't think the implementation is very nice. https://github.com/paullouisageneau/libjuice/pull/248

achingbrain commented 5 months ago

glad you released it

Ah, no - nothing's been released yet, I'm running locally off patched versions. Hopefully @paullouisageneau can take a look soon.

I have a PR at libjuice but don't think the implementation is very nice.

Yes, I saw that - I ended up just opening a UDP port using node.js and decoding the incoming messages using the stun module to extract the ufrag, though I'd much rather not have to pull an additional dependency in so something like this would be better.

xicilion commented 5 months ago

I ended up just opening a UDP port using node.js and decoding the incoming messages using the stun module to extract the ufrag

How can libdatachannel establish a connection with Peer A again on the same port in this way?

I read the code: https://github.com/libp2p/js-libp2p/pull/2583

at this line: https://github.com/libp2p/js-libp2p/pull/2583/files#diff-d97a3ca500f9b61eaf94a65583b51eb08d155fd9aae4dd1bae7ad98ed0160778R174

It looks like that the listener is creating a new rtc connection on a different port. that may not work behind NAT I thought.

Have you tested it in the NAT environment? @achingbrain

xicilion commented 5 months ago

Perfect, now the official version can be directly used for implementing webrtc direct.

xicilion commented 5 months ago

It looks like that the listener is creating a new rtc connection on a different port. that may not work behind NAT I thought.

I did some tests and confirmed that this approach won't work.

When Peer A is behind NAT, the connection will fail because NAT detects incoming packets from another port that has not been accessed from the inside before.

In other words, if Peer B wants to support access from Peer A behind NAT, the only way is to listen for connections on the same port using UdpMux.

So, we may have to consider the feasibility of https://github.com/paullouisageneau/libjuice/pull/248.

achingbrain commented 5 months ago

I figured something like that might be necessary, I haven't got as far as testing behind NATs yet - thanks for looking into it.

paullouisageneau / libdatachannel

some questions about support for WebRTC Direct #1166