jech / galene

The Galène videoconference server
https://galene.org
MIT License
900 stars 119 forks source link

Sometimes not working on iOS #132

Closed isaackwan closed 1 year ago

isaackwan commented 2 years ago

Hello! Thank you sooo much for galene, it's an amazing piece of software. The UI/SFU itself is amazing already and with the built-in TURN server it makes deployment so much easier. I can already see a future where companies/nerds can self-host a video server with a budget-friendly Raspberry Pi! That is doing so much to have a self-hosted future where everyone owns their data.

I have recently tried it out with friends and... apologies with the rather boring paragraphs, I have a few questions. I want to start by apologizing that I don't have much logs to share, I know it makes this a VERY bad issue report, but the trouble is it's hard to remotely debug iOS Safari while I only have a Linux desktop. Making sure iOS Safari is a goal for me.

  1. H264 doesn't work as well on iOS Safari. I have my codec set to ["h264", "opus"] and they won't talk to each other. My bad memory tells me that iOS -> Edge is working, but Edge -> iOS Safari is not. The error is "failed to set local answer sdp failed to set local video description recv parameters m-section with mid=1".

Device A: Windows MS Edge Chromium, have H/W acceleration for H264 & VP8 as per chrome://gpu Device B: iOS 15.3.1 Safari

  1. iOS Safari is flaky I now have my codec set to ["h264", "vp8" "opus"]. It doesn't work when I have Device A & C FIRST, then Device B joins. By doesn't work, I mean Device A & C can see stream from Device B, but Device B can only see itself. (This is about video, forgot audio's behaviour). If only Device A & B is there, it works.

From Device A's chrome://webrtc-internals, Device A is sending VP8, while Device C is sending H264.

Device A: Windows MS Edge Chromium, have H/W acceleration for H264 & VP8 as per chrome://gpu Device B: iOS 15.3.1 Safari Device C: OS X 12 Safari

  1. iOS Safari - lost navigation? I am not sure if it is a navigation problem or something else, but my friend (Device D, iOS Safari, version unknown) she's lost. She was presented with this screen: signal-2022-02-25-203306

There were already 2 devices in the group. However she couldn't hear/see the other 2 devices. At the same time, the other 2 devices didn't see her, not even in the chat sidebar.

Is she in a good state? Is all she need to do is close the sidebar on the left?

Thanks,

isaackwan commented 2 years ago

I am able to reproduce in a different scenario, and hopefully these logs are more useful: Device A: Windows Edge Chrome, h/w encoding for VP8 & H264 Device B: Chromebook, h/w encoding for VP8 & H264 Device C: iOS 15.3.1 Safari Web capture_3-3-2022_104729_webrtc-internals Web capture_3-3-2022_104719_webrtc-internals Web capture_3-3-2022_10479_webrtc-internals

https://gist.github.com/isaackwan/e6a9fd0f7bb49ccb2555643ade6814b4 <- stats.json

I forgot to mention that I am on latest, this is where I am on -

commit e19716489caf1485f862412d3623e906c0c84404 (HEAD -> master, origin/master, origin/HEAD)
Author: Juliusz Chroboczek <jch@irif.fr>
Date:   Mon Feb 21 23:47:39 2022 +0100

    Update CHANGES.

one last thing… on my problematic iOS Safari, I tried to call /renegotiate However I am not able to get to the chat screen..as you can see from the video, I got directed back to the video screen within seconds of going to the chat screen. It works when I am chatting 1-1.

https://user-images.githubusercontent.com/1005813/156490268-07688a19-d1ee-40b3-9921-0f25aac5a4e2.MOV

EDIT: sorry two more issues with TURN server

1 - relay test error on startup I am now on v0.4.4, and I see this on latest (see git log above) as well. During startup -

root@hk1:~/Programs/galene-v0.4.4# ./galene
2022/03/04 09:15:52 Starting built-in TURN server on :1194
2022/03/04 09:15:52 Relay test successful in 20.764638ms, RTT = 199.86µs
turn ERROR: 2022/03/04 09:15:52 error when handling datagram: failed to handle Refresh-request from $MYIP:51308: write tcp4 $MYIP:1194->$MYIP:51308: write: broken pipe

I am fairly certain that TURN server works, if I go to chat and do /relay-test it works.

root@hk1:~/Programs/galene-v0.4.4# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         
root@hk1:~/Programs/galene-v0.4.4# iptables -t nat -L
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         
root@hk1:~/Programs/galene-v0.4.4# iptables -t mangle -L
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination         

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination         

2 - I forced a TURN connection by connecting my phone to a restricted NAT VPN. Assume the phone is talking to another device already. Say if a device #3 joins, my phone would got kicked out. It would display a toast message "Disconnected". I would log in again and it works again.

Sorry for the lack of logs on client side.

Here's what's on the server log at that time:

turn ERROR: 2022/03/03 11:28:11 error when handling datagram: failed to handle Allocate-request from $VPNIP:48998: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:28:26 error when handling datagram: failed to handle Allocate-request from $VPNIP:47360: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:28:27 error when handling datagram: failed to handle Allocate-request from $VPNIP:47360: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:29:19 error when handling datagram: failed to handle Allocate-request from $VPNIP:41812: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:29:20 error when handling datagram: failed to handle Allocate-request from $VPNIP:41207: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:29:20 error when handling datagram: failed to handle Allocate-request from $VPNIP:37560: relay already allocated for 5-TUPLE

Unrelated but I actually saw a lot of log noise even when it works:

turn ERROR: 2022/03/03 11:16:39 error when handling datagram: failed to handle Allocate-request from $HOMEIP:38741: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:16:39 error when handling datagram: failed to handle Allocate-request from $HOMEIP:40167: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:16:39 error when handling datagram: failed to handle Allocate-request from $HOMEIP:41609: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:16:39 error when handling datagram: failed to handle Allocate-request from $HOMEIP:38741: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:16:39 error when handling datagram: failed to handle Allocate-request from $HOMEIP:40167: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:16:39 error when handling datagram: failed to handle Allocate-request from $HOMEIP:41609: relay already allocated for 5-TUPLE
2022/03/03 11:17:11 client: unknown id
2022/03/03 11:17:32 Replace: file does not exist
2022/03/03 11:17:32 Replace: file does not exist
2022/03/03 11:19:12 client: unknown id
2022/03/03 11:19:12 client: unknown id
2022/03/03 11:19:12 Replace: file does not exist
2022/03/03 11:20:54 ICE: unknown id in ICE
2022/03/03 11:20:54 ICE: unknown id in ICE
2022/03/03 11:20:54 ICE: unknown id in ICE
turn ERROR: 2022/03/03 11:20:56 error when handling datagram: failed to handle Allocate-request from $HOMEIP:33138: relay already allocated for 5-TUPLE
turn ERROR: 2022/03/03 11:20:56 error when handling datagram: failed to handle Allocate-request from $HOMEIP:39240: relay already allocated for 5-TUPLE
jech commented 2 years ago

H264 doesn't work as well on iOS Safari.

This probably implies that iOS doesn't implement the same profiles of H.264 than the other browsers. Could you please apply the patch below, and send me a copy of the offer sent by Safari?

iOS Safari is flaky

Same issue as above -- bad H.264 profile.

However she couldn't hear/see the other 2 devices. At the same time, the other 2 devices didn't see her, not even in the chat sidebar.

That's very strange. Is this reproducible? Is there any chance you could get her to show us the Javascript console output?

Patch for printing the offer:

diff --git a/rtpconn/webclient.go b/rtpconn/webclient.go
index 443089c..d51486f 100644
--- a/rtpconn/webclient.go
+++ b/rtpconn/webclient.go
@@ -571,6 +571,7 @@ func sendICE(c *webClient, id string, candidate *webrtc.ICECandidate) error {
 }

 func gotOffer(c *webClient, id, label string, sdp string, replace string) error {
+   println(sdp)
    up, _, err := addUpConn(c, id, label, sdp)
    if err != nil {
        return err
isaackwan commented 2 years ago

I wish I can show you the web console :-( however it's really hard to debug iOS Safari remotely I am able to capture the server output here -> https://pastebin.com/zAUXPXS3 sorry it's kind of messy, hope that it's easy to tell apart which one is from iOS & which one is from Chrome

BTW in the log file there is this line:

2022/03/10 10:47:41 client: unknown id
2022/03/10 10:47:42 Replace: file does not exist
2022/03/10 10:47:42 Replace: file does not exist

The moment iOS Safari joins, my Android Chrome got kicked out, toast on the UI is "Disconnected", had to rejoin to get it to work.

This is what we got from my Android Chrome console

galene.js:336 Socket close 1011 
gotClose @ galene.js:336
socket.onclose @ protocol.js:293

Interestingly, seems that running release v0.4.4 doesn't have this kick out problem.

Let me restate my case for this log file Device A: Windows MS Edge Chromium, have H/W acceleration for H264 & VP8 as per chrome://gpu Device B: Brave 1.35.101 (Chromium 98) on Android, have H/W acceleration for H264 & VP8 as per chrome://gpu Device C: iOS 15.3.1 Safari

The breakdown is:

jech commented 2 years ago

Thanks for the log. It looks like one of the devices (the third one in the log you posted) doesn't support Baseline profile, but only Constrained Baseline profile.

Could you please let me know whether the following patch fixes the issue?

diff --git a/group/group.go b/group/group.go
index 37d75dc..19fe5ca 100644
--- a/group/group.go
+++ b/group/group.go
@@ -269,11 +269,6 @@ func codecsFromName(name string) ([]webrtc.RTPCodecParameters, error) {
        }
    case "h264":
        codecs = []webrtc.RTPCodecCapability{
-           {
-               "video/H264", 90000, 0,
-               "level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42001f",
-               fb,
-           },
            {
                "video/H264", 90000, 0,
                "level-asymmetry-allowed=1;packetization-mode=1;profile-level-id=42e01f",
jech commented 2 years ago

Applied as fd09564. Please reopen if this doesn't fix your issue.

isaackwan commented 2 years ago

Hello, thanks for checking this! Unfortunately I am still seeing the same on latest (f66cabd6f467efe231e87ac795d94b08154af080) SDP logs on server: https://pastebin.com/3nyqd359 Stats.json: https://pastebin.com/J3HAdvfz

I am starting to think that this might be a client (UI) problem rather than server problem The situation is the same even with VP8 codec At the same time, sometimes I am able to see one of the two other peers in device #3. It's dependent on the sequence of joining. Unfortunately it's hard to debug without desktop Safari.

jech commented 1 year ago

Hopefully fixed by 1afb3c8.