C# Desktop sample does not work on the version 2.0 (Nuget)

dgtvan commented 3 years ago

Describe the bug

I followed the tutorial (https://microsoft.github.io/MixedReality-WebRTC/manual/cs/helloworld-cs-core3.html) to build a sample Desktop app on Windows 10.

I tried different versions of the library, some versions does not work. All versions are downloaded via Nuget.

All releases of the verion 2.0 (from 2.0.0-preview-1 to the latest 2.0.2): Both peers can not connect to each other, ICE status does not change.
Version 1.0 (the release 1.0.3 and 0.2.0-preview-20190925.5 are tested, other releases are not tested): Peers connect to each other and video datas are sent successfully.

MicrosoftTeams-image

Detail

Version 2.0 Here is the SDP messages on both peers. QGJylJGOPl (2)

Both peers get stucked here forever, they do nothing else, so I ended up closing the commandline windows.

The sample source code has nothing special, I followed the tutorial, however, I uploaded it here for your review if necessary. DesktopApp_2.0.2.zip

Version 1.0 Here is the SDP messages on both peers. p2ZLiIaxMD Everything is fine.

Environment

Both peers are on the same PC (local).
Versions of the MixedReality.WebRTC are mentioned above.
Platform: C# Windows Desktop Application.
- Architecture: The problem occurs on both x86 and x64, Windows 10.
- Unity version: No. I do not use Unity.
- Target device: Windows Desktop.

I am new to it so please help to clarify the problem. Thank you.

djee-ms commented 3 years ago

I think there's a subtlety here that we forgot to document: in the first case (2.x) you start the server side (callee) with video capture, and the client side (caller) without anything. When the client/caller sends the offer, it has nothing to offer (you can see the empty SDP message, extremely short, no mid line). The server/callee receives that offer, with no media to receive, and by design always reply without adding any more transceiver (this is another subtlety, I fought with this hard, and the standard is not clear to me, but as far as I can tell the callee cannot add transceivers dynamically in its answer that were not present in the caller's offer). Therefore the server/callee answer is also an empty SDP message without anything to send nor receive. This all "works as expected" considering the limitations of the implementations.

Can you try again with either -v on both sides, or by starting the -v last (so it becomes the client/caller)?

The last, more robust but more complex, solution is to modify the code such that if the callee needs to add tracks/transceivers because the caller didn't, it does so after the initial negotiation, in a new negotiation it initiates itself (so first negotiation is for caller's intent, second one in an upgrade to match callee's intent).

dgtvan commented 3 years ago

Thank you for your explanation.

I think there's a subtlety here that we forgot to document: in the first case (2.x) you start the server side (callee) with video capture, and the client side (caller) without anything. When the client/caller sends the offer, it has nothing to offer (you can see the empty SDP message, extremely short, no mid line). The server/callee receives that offer, with no media to receive, and by design always reply without adding any more transceiver (this is another subtlety, I fought with this hard, and the standard is not clear to me, but as far as I can tell the callee cannot add transceivers dynamically in its answer that were not present in the caller's offer). Therefore the server/callee answer is also an empty SDP message without anything to send nor receive. This all "works as expected" considering the limitations of the implementations.

Can you try again with either -v on both sides, or by starting the -v last (so it becomes the client/caller)?

I understand what you described in this situtation. The peer creating an offer has to add transceivers before it sends an offer. (Transceivers can be added later, then a new offer must be sent again. I refer Add local media tracks).

I tried again with either -v on both sides, or by starting the -v last (so it becomes the client/caller), it works as my expectation.

The server side (callee) with an empty argument.
The client side (caller) with a -v argument. => The result is that the client sends video datas to the server.

However, I still want to make the server send video datas to the client, so I made a litle change but it does not work.

The server side (callee) with video capture, transceiver's direction is SendReceive.

The client side (caller) with video receive only, transceiver's direction is ReceiveOnly.

var needVideo = Array.Exists(args, arg => (arg == "-v") || (arg == "-rv"));
var videoDirection = args.Where(w => w == "-rv").Count() > 0
                                        ? Transceiver.Direction.ReceiveOnly
                                        : Transceiver.Direction.SendReceive;
...
if (needVideo)
{
    videoTransceiver = pc.AddTransceiver(MediaKind.Video);
    videoTransceiver.DesiredDirection = videoDirection;

    if (videoDirection == Transceiver.Direction.SendReceive)
    {
          videoTrackSource = await DeviceVideoTrackSource.CreateAsync();

          var trackSettings = new LocalVideoTrackInitConfig { trackName = "webcam_track" };
          localVideoTrack = LocalVideoTrack.CreateFromSource(videoTrackSource, trackSettings);
          videoTransceiver.LocalVideoTrack = localVideoTrack;
    }
}

Here is the SDP message content:

SDP messages with line breaks: server_sdp.log client_sdp.log
SDP messages without line breaks: (I removed line breaks to shorten them when they are showing here).

Server

Test.exe -v Found webcam VMware Virtual Webcam (id: \?\root#image#0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\global) Found webcam eBUS DirectShow Source (id: eBUS DirectShow Source) Peer connection initialized. Opening local webcam... Create local video track... Starting signaling... Created pipe server; acting as server. Waiting for the remote peer to connect...Remote peer connected. Attempting to connect back to the remote peer...Signaler connection established. Waiting for offer from remote peer... Press a key to stop recording... [<-] sdp [<-] SDP message: offer v=0 o=- 8894610523552401277 2 IN IP4 127.0.0.1 s=- t=0 0 a=group:BUNDLE 0 a=msid-semantic: WMS m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 127 124 125 c=IN IP4 0.0.0.0 a=rtcp:9 IN IP4 0.0.0.0 a=ice-ufrag:fc2C a=ice-pwd:Nz5K2X38euTAfzEtwa/6Gf6Q a=ice-options:trickle a=fingerprint:sha-256 4A:E7:FC:60:0E:35:43:35:7A:DB:CA:27:22:9B:CA:58:DD:63:9E:A9:4F:46:EA:32:7F:C2:5B:99:C1:DB:C5:76 a=setup:actpass a=mid:0 a=extmap:2 urn:ietf:params:rtp-hdrext:toffset a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time a=extmap:4 urn:3gpp:video-orientation a=extmap:5 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01 a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/video-timing a=extmap:10 http://tools.ietf.org/html/draft-ietf-avtext-framemarking-07 a=extmap:9 urn:ietf:params:rtp-hdrext:sdes:mid a=recvonly a=rtcp-mux a=rtcp-rsize a=rtpmap:96 VP8/90000 a=rtcp-fb:96 goog-remb a=rtcp-fb:96 transport-cc a=rtcp-fb:96 ccm fir a=rtcp-fb:96 nack a=rtcp-fb:96 nack pli a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96 a=rtpmap:98 VP9/90000 a=rtcp-fb:98 goog-remb a=rtcp-fb:98 transport-cc a=rtcp-fb:98 ccm fir a=rtcp-fb:98 nack a=rtcp-fb:98 nack pli a=fmtp:98 x-google-profile-id=0 a=rtpmap:99 rtx/90000 a=fmtp:99 apt=98 a=rtpmap:100 multiplex/90000 a=rtcp-fb:100 goog-remb a=rtcp-fb:100 transport-cc a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=fmtp:100 acn=VP9;x-google-profile-id=0 a=rtpmap:101 rtx/90000 a=fmtp:101 apt=100 a=rtpmap:127 red/90000 a=rtpmap:124 rtx/90000 a=fmtp:124 apt=127 a=rtpmap:125 ulpfec/90000 [<-] PeerConnection: connected. [->] sdp answer v=0 o=- 7324555918015828337 2 IN IP4 127.0.0.1 s=- t=0 0 a=group:BUNDLE 0 a=msid-semantic: WMS m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 127 124 125 c=IN IP4 0.0.0.0 a=rtcp:9 IN IP4 0.0.0.0 a=ice-ufrag:x3s1 a=ice-pwd:9b3z0MNur0j3qk+9ltqP+ChY a=ice-options:trickle a=fingerprint:sha-256 02:2E:3F:3F:9A:84:2E:50:D8:9B:A0:92:C5:D1:84:E7:0B:CF:27:01:C2:3D:4A:1D:3B:0B:87:1D:DA:6F:59:C9 a=setup:active a=mid:0 a=extmap:2 urn:ietf:params:rtp-hdrext:toffset a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time a=extmap:4 urn:3gpp:video-orientation a=extmap:5 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01 a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/video-timing a=extmap:10 http://tools.ietf.org/html/draft-ietf-avtext-framemarking-07 a=extmap:9 urn:ietf:params:rtp-hdrext:sdes:mid a=inactive a=rtcp-mux a=rtcp-rsize a=rtpmap:96 VP8/90000 a=rtcp-fb:96 goog-remb a=rtcp-fb:96 transport-cc a=rtcp-fb:96 ccm fir a=rtcp-fb:96 nack a=rtcp-fb:96 nack pli a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96 a=rtpmap:98 VP9/90000 a=rtcp-fb:98 goog-remb a=rtcp-fb:98 transport-cc a=rtcp-fb:98 ccm fir a=rtcp-fb:98 nack a=rtcp-fb:98 nack pli a=fmtp:98 x-google-profile-id=0 a=rtpmap:99 rtx/90000 a=fmtp:99 apt=98 a=rtpmap:100 multiplex/90000 a=rtcp-fb:100 goog-remb a=rtcp-fb:100 transport-cc a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=fmtp:100 acn=VP9;x-google-profile-id=0 a=rtpmap:101 rtx/90000 a=fmtp:101 apt=100 a=rtpmap:127 red/90000 a=rtpmap:124 rtx/90000 a=fmtp:124 apt=127 a=rtpmap:125 ulpfec/90000 Finished processing messages

Client

Test.exe -rv Found webcam VMware Virtual Webcam (id: \?\root#image#0000#{65e8773d-8f56-11d0-a3b9-00a0c9223196}\global) Found webcam eBUS DirectShow Source (id: eBUS DirectShow Source) Peer connection initialized. Starting signaling... Pipe server already exists; acting as client. Attempting to connect to the remote peer...Connected to the remote peer. There are currently 1 pipe server instances open. Waiting for the remote peer to connect back...Signaler connection established. Connecting to remote peer... Press a key to stop recording... [->] sdp offer v=0 o=- 8894610523552401277 2 IN IP4 127.0.0.1 s=- t=0 0 a=group:BUNDLE 0 a=msid-semantic: WMS m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 127 124 125 c=IN IP4 0.0.0.0 a=rtcp:9 IN IP4 0.0.0.0 a=ice-ufrag:fc2C a=ice-pwd:Nz5K2X38euTAfzEtwa/6Gf6Q a=ice-options:trickle a=fingerprint:sha-256 4A:E7:FC:60:0E:35:43:35:7A:DB:CA:27:22:9B:CA:58:DD:63:9E:A9:4F:46:EA:32:7F:C2:5B:99:C1:DB:C5:76 a=setup:actpass a=mid:0 a=extmap:2 urn:ietf:params:rtp-hdrext:toffset a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time a=extmap:4 urn:3gpp:video-orientation a=extmap:5 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01 a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/video-timing a=extmap:10 http://tools.ietf.org/html/draft-ietf-avtext-framemarking-07 a=extmap:9 urn:ietf:params:rtp-hdrext:sdes:mid a=recvonly a=rtcp-mux a=rtcp-rsize a=rtpmap:96 VP8/90000 a=rtcp-fb:96 goog-remb a=rtcp-fb:96 transport-cc a=rtcp-fb:96 ccm fir a=rtcp-fb:96 nack a=rtcp-fb:96 nack pli a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96 a=rtpmap:98 VP9/90000 a=rtcp-fb:98 goog-remb a=rtcp-fb:98 transport-cc a=rtcp-fb:98 ccm fir a=rtcp-fb:98 nack a=rtcp-fb:98 nack pli a=fmtp:98 x-google-profile-id=0 a=rtpmap:99 rtx/90000 a=fmtp:99 apt=98 a=rtpmap:100 multiplex/90000 a=rtcp-fb:100 goog-remb a=rtcp-fb:100 transport-cc a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=fmtp:100 acn=VP9;x-google-profile-id=0 a=rtpmap:101 rtx/90000 a=fmtp:101 apt=100 a=rtpmap:127 red/90000 a=rtpmap:124 rtx/90000 a=fmtp:124 apt=127 a=rtpmap:125 ulpfec/90000 [<-] sdp [<-] SDP message: answer v=0 o=- 7324555918015828337 2 IN IP4 127.0.0.1 s=- t=0 0 a=group:BUNDLE 0 a=msid-semantic: WMS m=video 9 UDP/TLS/RTP/SAVPF 96 97 98 99 100 101 127 124 125 c=IN IP4 0.0.0.0 a=rtcp:9 IN IP4 0.0.0.0 a=ice-ufrag:x3s1 a=ice-pwd:9b3z0MNur0j3qk+9ltqP+ChY a=ice-options:trickle a=fingerprint:sha-256 02:2E:3F:3F:9A:84:2E:50:D8:9B:A0:92:C5:D1:84:E7:0B:CF:27:01:C2:3D:4A:1D:3B:0B:87:1D:DA:6F:59:C9 a=setup:active a=mid:0 a=extmap:2 urn:ietf:params:rtp-hdrext:toffset a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time a=extmap:4 urn:3gpp:video-orientation a=extmap:5 http://www.ietf.org/id/draft-holmer-rmcat-transport-wide-cc-extensions-01 a=extmap:6 http://www.webrtc.org/experiments/rtp-hdrext/playout-delay a=extmap:7 http://www.webrtc.org/experiments/rtp-hdrext/video-content-type a=extmap:8 http://www.webrtc.org/experiments/rtp-hdrext/video-timing a=extmap:10 http://tools.ietf.org/html/draft-ietf-avtext-framemarking-07 a=extmap:9 urn:ietf:params:rtp-hdrext:sdes:mid a=inactive a=rtcp-mux a=rtcp-rsize a=rtpmap:96 VP8/90000 a=rtcp-fb:96 goog-remb a=rtcp-fb:96 transport-cc a=rtcp-fb:96 ccm fir a=rtcp-fb:96 nack a=rtcp-fb:96 nack pli a=rtpmap:97 rtx/90000 a=fmtp:97 apt=96 a=rtpmap:98 VP9/90000 a=rtcp-fb:98 goog-remb a=rtcp-fb:98 transport-cc a=rtcp-fb:98 ccm fir a=rtcp-fb:98 nack a=rtcp-fb:98 nack pli a=fmtp:98 x-google-profile-id=0 a=rtpmap:99 rtx/90000 a=fmtp:99 apt=98 a=rtpmap:100 multiplex/90000 a=rtcp-fb:100 goog-remb a=rtcp-fb:100 transport-cc a=rtcp-fb:100 ccm fir a=rtcp-fb:100 nack a=rtcp-fb:100 nack pli a=fmtp:100 acn=VP9;x-google-profile-id=0 a=rtpmap:101 rtx/90000 a=fmtp:101 apt=100 a=rtpmap:127 red/90000 a=rtpmap:124 rtx/90000 a=fmtp:124 apt=127 a=rtpmap:125 ulpfec/90000 PeerConnection: connected. ICE state: Checking [<-]

My thought I think the server has to create an offer with transceiver added and send it to the client. Basically, each side has to offer what it has:

The client offers that it can receive video.
The server offers that it can do both send and receive video.

I have not tested yet because I am not sure if it is a proper way. Please help to tell if my thought is correct.

The last, more robust but more complex, solution is to modify the code such that if the callee needs to add tracks/transceivers because the caller didn't, it does so after the initial negotiation, in a new negotiation it initiates itself (so first negotiation is for caller's intent, second one in an upgrade to match callee's intent).

I am thinking about your explanation. I have not understood yet.

djee-ms commented 3 years ago

The server side (callee) with video capture, transceiver's direction is SendReceive. The client side (caller) with video receive only, transceiver's direction is ReceiveOnly.

Setting the transceiver direction will not change anything to the problem. I know it's very confusing, it took me weeks to understand the details. What happens is this:

Caller sends offer with media line mid:0 in recvonly mode
Callee receives and apply offer. Internal Google implementation
1. creates a new transceiver for the media line mid:0, which by default is in recvonly mode on creation
2. looks at mode, finds recvonly from caller, and recvonly from callee, concludes nobody needs to send anything, so change the mode to inactive
Callee creates an answer
1. looks at existing transceivers, adds media line mid:0
2. looks at callee's transceivers, finds the one added by the server, but it's not paired yet so ignores it
3. sends the answer with media line in inactive mode
Caller receives an answer with media line mode=inactive, applies it

So setting the direction on the server doesn't change anything since the implementation will ignore that transceiver anyway while generating the answer, and will not (as one might expect) associate the transceiver created by the caller with the transceiver created by the callee (this is never possible as far as I know).

I don't understand your SDP messages because they both have two video media lines, which is inconsistent with the code and description you made, especially on the offering side.

I think the server has to create an offer with transceiver added and send it to the client.

Yes you can do it like this. But because the server is listening on the client at the network/pipe level, it was easier to have the client send the offer once connected, and is a more realistic scenario (generally client actively query server, not the opposite). But you can absolutely reverse the logic.

Basically, each side has to offer what it has:

No, that won't work. This is not how WebRTC works as far as I can understand. Only the offering side (caller) can offer something. The answering side (callee) can only accept what's being offered, it cannot offer more. The only way that can work is to have 2 negotiations. First the client offers and the server answers. Then later once done, the server offers again and the client answers. This takes more time (2 negotiations) and one need to be very careful that the first negotiation is done before starting the second one. But that works otherwise.

dgtvan commented 3 years ago

Thank you a lot. I made it work.

djee-ms commented 3 years ago

And thanks to you for the docs update! I'm closing this issue now that I think everything is working and documented. Please feel free to reopen if there's anything else.

microsoft / MixedReality-WebRTC