signalapp / Signal-Desktop

A private messenger for Windows, macOS, and Linux.
https://signal.org/download
GNU Affero General Public License v3.0
14.16k stars 2.57k forks source link

Make Signal Work at the South Pole #6899

Closed g1a55er closed 3 weeks ago

g1a55er commented 3 weeks ago

First time contributor checklist:

Contributor checklist:

Description

I was reading a post on the popular brr.fyi blog, when I came across a familiar looking "unnamed chat app 1" that reportedly did not work at the South Pole. I was quite disappointed to see that WhatsApp.... errrr.... "unnamed chat app 2" works fine in these conditions, so I decided to treat this post as a bug report.

The anonymous author was kind enough to provide the following debug log from the field:

There are a few things that come to mind when analyzing the problem of improving performance on low-throughput-high-latency connectivity connections:

1) The WebSocket establishment handshake takes many roundtrips (SYN, SYN-ACK, ACK, ClientHello, ServerHello, TLS Negotiation, Web Socket Upgrade). I would imagine "competitor app 2" is probably using QUIC or something similar by now to consolidate these down into the bare minimum. This is, in my opinion, the main technical root cause of the performance differential. However, I completely understand that organizational and resource constraints probably make this a high-investment, longer term item to fix. 2) The WebSocket abstraction provided by the TypeScript library doesn't provide us with any window into if the handshake is making progress or not. Ideally, I'd like to set it to bail out after we stop making progress through the various stages of the bootstrap handshake, but to keep going by resetting the timer if we are making good progress. Unfortunately, we don't have visibility on if we are still making progress, or if it has all just failed. We only just get a final connection event when connection is fully established. 3) So, we're left with a trade-off between improving success rates on low end connections and increasing wasted wait time in the disconnected failure states while we wait for doomed connection attempts to eventually fail. Ideally, we'd have some data we could use to help drive this decision: How many failures are we getting? How long is the P99 success case? How many false positives will we get for a given connection timeout value? I don't have access to any data like that for the Signal user base, so I just choose a simple <10s, 20s, 30s, 30s,...> progression. Hopefully this improves success rates without hurting reconnect speed for most users.

I'd like to take a deeper dive into these issues if I get some more time. 🤞

Test Strategy

[x] Ran yarn ready and got a pass [x] Ran 'yarn start' and nothing seemed obviously more broken than staging normally is.

g1a55er commented 3 weeks ago

I saw this needed some work when I did some testing earlier this morning using the macOS link conditioner under a custom "South Pole" profile.

Screenshot 2024-06-03 at 2 50 26 PM

I based these settings on the conditions reported in the blog post.

With these changes, I can now connect to Signal with the link conditioner set to those settings if I am very, very patient. Diving into how the app behaves, I think there's a few other areas for improvement I spotted that I'm noting down for my own reference:

jamiebuilds-signal commented 3 weeks ago

Thank you for the pull request. We're actually working on a longer term project to move a lot of this to a Rust library and discussing how we could improve this. However, just increasing the timeouts pushes the problem downstream to a lot of places in the app and causes other issues. So I think for now we'll defer doing this