meetecho / janus-gateway

Janus WebRTC Server
https://janus.conf.meetecho.com
GNU General Public License v3.0
8.15k stars 2.47k forks source link

janus admin does not show trickled ICE candidates #257

Closed ploxiln closed 9 years ago

ploxiln commented 9 years ago

I'm not sure if this is a known / intentional feature omission.

I've captured this example with the videoroom plugin and firefox

what firefox says (notice that a peer-reflexive candidate is selected): screen shot 2015-06-10 at 13 34 20

what janus admin says (trimmed a bit):

{
    "session_id": 3859271597,
    "handle_id": 581710454,
    "plugin": "janus.plugin.videoroom",
    "plugin_specific": {
        "type": "listener",
        "room": 1234,
        "feed_id": 1464847231,
        "feed_display": "plo2",
        "destroyed": 0
    },
    "flags": {
        "...": "..."
    },
    "sdps": {
        "local": "...\r\n",
        "remote": "...\r\n"
    },
    "streams": [
        {
            "id": 1,
            "ready": -1,
            "disabled": "false",
            "ssrc": {
                "audio": 2134262819,
                "video": 3265223003
            },
            "components": [
                {
                    "id": 1,
                    "state": "ready",
                    "local-candidates": [
                        "2 1 udp 2013266431 54.152.130.162 10001 typ host\r\n",
                        "9 1 udp 1677721855 54.152.130.162 10001 typ srflx raddr 10.0.1.118 rport 10001\r\n"
                    ],
                    "remote-candidates": [
                        "0 1 UDP 2130379007 192.168.41.134 62090 typ host",
                        "2 1 UDP 2128609535 172.30.1.1 59524 typ host",
                        "1 1 UDP 1694236671 108.176.27.194 7280 typ srflx raddr 192.168.41.134 rport 62090"
                    ],
                    "selected-pair": "2 <-> 1",
                    "dtls": {
                        "fingerprint": "...",
                        "remote-fingerprint": "...",
                        "dtls-role": "passive",
                        "dtls-state": "connected",
                        "valid": 1,
                        "ready": 1
                    },
                    "in_stats": {
                        "audio_bytes": 0,
                        "video_bytes": 0,
                        "data_bytes": 974,
                        "audio_nacks": 0,
                        "video_nacks": 0,
                        "audio_bytes_lastsec": 0,
                        "video_bytes_lastsec": 0
                    },
                    "out_stats": {
                        "audio_bytes": 169702,
                        "video_bytes": 565770,
                        "data_bytes": 1295,
                        "audio_nacks": 0,
                        "video_nacks": 0
                    }
                }
            ]
        }
    ]
}
lminiero commented 9 years ago

Not all trickled candidates are added: when a PeerConnection has been established, for instance, late candidates are dropped as they won't be needed. It might be a good idea to store them anyway, at least for debugging, e.g., to make sure all those that were sent got to destination.

ploxiln commented 9 years ago

I see. But also notice that firefox thought its candidate (108.176.27.194:7280) was peer-reflexive, while janus thought it was server-reflexive

lminiero commented 9 years ago

Janus thinks it's server reflexive because the trickle candidate it received said so. I don't know why Firefox calls it peer-reflexive, as it's Firefox that sent the trickle candidate in the first place.

lminiero commented 9 years ago

Just checked and we actually don't discard candidates after the media has been setup anymore: this is something I already fixed some time ago. The only candidates we drop are those related to streams we won't use because of bundling.

The candidates you see in Janus are actually all the candidates Firefox sent, apparently: if you look at the screenshot you made, it's the same three candidates all over, which are just associated with different candidates Janus sent for the ICE pairing. By the way, are you using the "public IP" setting along the STUN server? In fact, remote (Janus) candidates look exactly the same, while the host IP should be a private address when you use STUN.

ploxiln commented 9 years ago

I'm both setting the public ip, and a stun server. So yes it's using the configured public ip for host candidates on the janus side. I set the stun server on the janus side so firefox can initialize the webrtc/ice/dtls connection... by the way, I don't need to configure it on the firefox side, only the janus side, to avoid the state where janus doesn't realize firefox selected a candidate...

Anyway, here's another example, as minimal as possible, no stun server for firefox, public ip configured for janus:

screen shot 2015-06-17 at 20 14 29

                    "id": 1,
                    "state": "ready",
                    "local-candidates": [
                        "1 1 udp 2013266431 52.6.64.223 10000 typ host\r\n",
                        "6 1 udp 1677721855 52.6.64.223 10000 typ srflx raddr 10.0.0.10 rport 10000\r\n"
                    ],
                    "remote-candidates": [
                        "0 1 UDP 2130379007 192.168.41.47 54150 typ host",
                        "1 1 UDP 2128609535 172.30.1.1 55789 typ host"
                    ],
                    "selected-pair": "1 <-> 1",
                    "dtls": {
                        "fingerprint": "0F:DC:49:1E:6C:8B:92:B5:A2:B8:ED:DF:F5:27:BE:C4:04:77:C1:95:C1:33:AE:3F:AA:D2:9F:5E:8E:83:6B:14",
                        "remote-fingerprint": "87:E0:10:58:F1:32:DD:4D:2D:45:C9:ED:8A:EE:4A:F8:8A:AA:4B:8C:C9:07:E2:0C:A3:DF:FD:BA:BD:85:C9:36",
                        "dtls-role": "passive",
                        "dtls-state": "connected",
                        "valid": 1,
                        "ready": 1
                    },

Notice that the "remote-candidates" which janus lists do not include the ip address which is actually used on the firefox side. And the connection works.

I also noticed that firefox did not send the peer-reflexive candidate over the http api to janus; it sent the two host candidates / ip addresses you see in the "remote-candidates" list over http, but must have sent the peer-reflexive candidate directly via udp to the host candidate janus offered, and libnice must have picked it up from there. Or maybe it's a more introspective thing. I should probably look up how this stuff really works...

ploxiln commented 9 years ago

possibly interesting janus log detail for that connection:

Jun 18 00:08:01  [1831101908] Audio has been negotiated, Video has been negotiated, SCTP/DataChannels have NOT been negotiated
Jun 18 00:08:01  [1831101908] The browser: supports BUNDLE, supports rtcp-mux, is doing Trickle ICE
Jun 18 00:08:01  [1831101908] Fingerprint (global) : sha-256 87:E0:10:58:F1:32:DD:4D:2D:45:C9:ED:8A:EE:4A:F8:8A:AA:4B:8C:C9:07:E2:0C:A3:DF:FD:BA:BD:85:C9:36
Jun 18 00:08:01  [1831101908] Parsing audio candidates (stream=1)...
Jun 18 00:08:01  [1831101908] ICE pwd (local):     817abeb7f39d04d13e5892c0fc7e137f
Jun 18 00:08:01  [1831101908] ICE ufrag (local):   169d99c4
Jun 18 00:08:01  [1831101908] Audio mid: audio
Jun 18 00:08:01  [1831101908] DTLS setup (local):  active
Jun 18 00:08:01  [1831101908] Parsing video candidates (stream=2)...
Jun 18 00:08:01  [1831101908] ICE pwd (local):     817abeb7f39d04d13e5892c0fc7e137f
Jun 18 00:08:01  [1831101908] ICE ufrag (local):   169d99c4
Jun 18 00:08:01  [1831101908] Video mid: video
Jun 18 00:08:01  [1831101908] DTLS setup (local):  active
Jun 18 00:08:01  [1831101908]   -- ICE Trickling is supported by the browser, waiting for remote candidates...
Jun 18 00:08:01    >> Anonymized sdp (759 --> 352 bytes)
Jun 18 00:08:01  Handling message: {"request": "start"}
Jun 18 00:08:01  This is involving a negotiation (answer) as well
Jun 18 00:08:01  [1831101908] Adding event to queue of messages...
Jun 18 00:08:01    >> Pushing event: 0 (Success) (took 83 us)
Jun 18 00:08:01  Got a HTTP POST request on /janus/2557526676/1831101908...
Jun 18 00:08:01  [1831101908] Trickle candidate (): candidate:0 1 UDP 2130379007 192.168.41.47 54150 typ host
Jun 18 00:08:01  [1831101908]  Adding remote candidate component:1 stream:1 type:host 192.168.41.47:54150
Jun 18 00:08:01  [1831101908] ICE just started for this component, setting candidates we have up to now
Jun 18 00:08:01  [1831101908] ## Setting remote candidates: stream 1, component 1 (1 in the list)
Jun 18 00:08:01  [1831101908] >> Remote Stream #1, Component #1: Address 192.168.41.47:54150
Jun 18 00:08:01  [1831101908]  Setting remote credentials...
Jun 18 00:08:01  [1831101908] Component state changed for component 1 in stream 1: 2 (connecting)
Jun 18 00:08:01  [1831101908] Remote candidates set!
Jun 18 00:08:01  Got a HTTP POST request on /janus/2557526676/1831101908...
Jun 18 00:08:01  [1831101908] Trickle candidate (): candidate:1 1 UDP 2128609535 172.30.1.1 55789 typ host
Jun 18 00:08:01  [1831101908]  Adding remote candidate component:1 stream:1 type:host 172.30.1.1:55789
Jun 18 00:08:01  [WARN] [1831101908] Still waiting for the DTLS stack for component 1 in stream 1...
Jun 18 00:08:01  [1831101908] Component state changed for component 1 in stream 1: 3 (connected)
Jun 18 00:08:01  [1831101908] New selected pair for component 1 in stream 1: 1 <-> 1
Jun 18 00:08:01  [1831101908]   Component is ready enough, starting DTLS handshake...
Jun 18 00:08:01  [1831101908]   Setting accept state (DTLS server)
Jun 18 00:08:01  [1831101908] Creating retransmission timer with ID 8
Jun 18 00:08:01  [WARN] [1831101908]     Missing valid SRTP session (packet arrived too early?), skipping...
Jun 18 00:08:01  Got a HTTP POST request on /janus/2557526676/1831101908...
Jun 18 00:08:01  No more remote candidates for handle 1831101908!
Jun 18 00:08:01  Got a HTTP GET request on /janus/2557526676...
Jun 18 00:08:01  Session 2557526676 found... returning up to 1 messages
Jun 18 00:08:02  Got a HTTP GET request on /janus/2557526676...
Jun 18 00:08:02  Session 2557526676 found... returning up to 1 messages
Jun 18 00:08:02  [1831101908] Looks like DTLS!
Jun 18 00:08:02  [1831101908] Looks like DTLS!
Jun 18 00:08:02  [1831101908] DTLS established, yay!
Jun 18 00:08:02  [1831101908] Computing sha-256 fingerprint of remote certificate...
Jun 18 00:08:02  [1831101908] Remote fingerprint (sha-256) of the client is 87:E0:10:58:F1:32:DD:4D:2D:45:C9:ED:8A:EE:4A:F8:8A:AA:4B:8C:C9:07:E2:0C:A3:DF:FD:BA:BD:85:C9:36
Jun 18 00:08:02  [1831101908]  Fingerprint is a match!
Jun 18 00:08:02  [1831101908] Created inbound SRTP session for component 1 in stream 1
Jun 18 00:08:02  [1831101908] Created outbound SRTP session for component 1 in stream 1
Jun 18 00:08:02  [1831101908] The DTLS handshake for the component 1 in stream 1 has been completed
Jun 18 00:08:02  [1831101908] The DTLS handshake has been completed
Jun 18 00:08:02  WebRTC media is now available 
lminiero commented 9 years ago

If you didn't configure a STUN server in Firefox, Firefox only sent the private IP host candidates in trickles, which is why Janus only knows about those.

It's libnice that does the magic: it receives the connectivity check Firefox sent from one of the private addresses and sees it coming from a public one (the one STUN would have resolved for Firefox), so adds it as a new peer-reflexive candidate; sends a connectivity check of its own to the public address, which the NAT forwards to the original private address of Firefox; Firefox receives it on one of the two private addresses it reserved and realizes, looking at the STUN request, that this was sent to a public address, and so understands it has a peer reflexive address available too (it's as if it used STUN but it didn't).

The fact you don't see the peer-reflexive candidate in the Janus admin API is because we only add what we receive via trickle there: we don't actually inspect the internal state of libnice at that moment, which may have more. It's easier that way because it's also lighter, you populate a string array when receiving trickles and send them back in API requests just as they are, without having to format them. Having to inspect libnice each call might be heavier, and you'd have to format them as candidates each time as well, although probably more correct. I'll have a look at http://nice.freedesktop.org/libnice/NiceAgent.html#nice-agent-get-remote-candidates (which we already use somewhere in Janus I think) for that.

lminiero commented 9 years ago

As to the log you pasted, I think it means that the 108.* peer reflexive address you see in Firefox is actually associated to the 172.* private address Firefox advertized: that is, the original STUN request came from the 172 address, turned out as 108 as far as Janus is concerned, libnice allocates the new peer-reflexive candidate for that and maps it to one of the candidates it knows (as the STUN connectivity check contains the original 172 address in it) and answers to that. Since that pair turns out to be the winner, that's the selected candidate for Janus.

ploxiln commented 9 years ago

Yeah. By the end of the investigation I figured it might actually be rather inconvenient for janus to show the peer-reflexive candidate. If libnice is mapping it to one of the candidates janus knows about, janus' behavior does seem justified. Thanks for looking :)

ploxiln commented 9 years ago

Also, I guess I should look at logs more before making a claim, but I really don't think the peer reflexive candidate actually used the interface with the address 172... , that's a virtual interface only connected to a local VM, whereas the 192... address was on the real interface which is connected to the office lan and through a nat gateway to the internet. This test was against janus running in ec2, and 108... is the office public ip.

lminiero commented 9 years ago

It may be the 172 interface is still allowed to go on the internet (I assume the VMs are allowed to) and maybe bridged, unless it's only there to allow VMs to get to the host and from there to the internet through another of the interfaces.

lminiero commented 9 years ago

I assume we can close this? Feel free to reopen if there was any issue left pending.