wiz0u / WTelegramClient

Telegram Client API (MTProto) library written 100% in C# and .NET
https://wiz0u.github.io/WTelegramClient/
MIT License
956 stars 156 forks source link

Erroneous dcSession recreation during reconnects #233

Closed schmellow closed 6 months ago

schmellow commented 7 months ago

Similiar issue: https://github.com/wiz0u/WTelegramClient/issues/160 There it was ruled that it's a problem on the Telegram side. After investigating this, i do believe now it is due to the interaction of Telegram server issues and the way this library handles reconnects.

Here are my logs (you can see similiarity to ones from issue 160) wtc.txt

What caught my attention is "Verifying encryption key safety... (this should happen only once per DC)" line. If we are working within one DC (and logs do confirm this - we are still within DC 2), authorization key should not be recreated. It only happens when _dcSession.AuthKeyID == 0:

if (_dcSession.AuthKeyID == 0)
    await CreateAuthorizationKey(this, _dcSession);

which itself will never happen for established session, unless _dcSession is recreated mid-reconnection.

So, your reconnection code (Client.cs, lines 816-863) consists of 3 phases so to speak: 1) Try to connect to current _dcSession endpoint 2) If that fails, iterate through alt endpoints 3) If that fails too, do last ditch attempt to connect to "default" endpoint, which is 149.154.167.50. I believe this is the culprit - this phase recreates _dcSession, if it asserts that endpoint has not been tried and is not present in the list of alts for current DC.

Normally for DC 2 three addresses are available: 1) 149.154.167.41 2) 149.154.167.50 (which is the same as default) 3) [2001:67c:4e8:f002::a] (this one always fails in our network)

If we connected to 1, then 2 and 3 will be alts, and if we connected to 2, then 1 and 3 will be alts

Hypothesis as to what actually happens:

1) We are connected to 149.154.167.41. Telegram starts having issues, and we reconnect. 2) During this reconnection (Client.cs, line 906) Telegram sends us list of DcOptions that does NOT contain 149.154.167.50 (because maybe it went for reboot and is not tracked in their system for the time being) 3) 149.154.167.41 fails again, now fatally (because maybe it went to reboot too, after 167.50?) 4) We try to reconnect to 41 - it fails 5) We iterate alts, but it only has ipv6 one, and it fails too 6) We go to phase 3 and connect to 149.154.167.50, but since it is not in list of known alts, and was not tried, it is surmised, that it is different DC and _dcSession gets recreated. Hence CreateAuthorizationKey call, hence 401 error, because server does not know our new key

Proposed mitigation: remove phase 3 (excerpt below) or hide it behind config flag. Client will fail, but at least it won't destroy established auth

if (tcpClient == null)
{
  endpoint = Compat.IPEndPoint_Parse(Config("server_address")); // re-ask callback for an address
  if (!triedEndpoints.Add(endpoint)) throw;
  _dcSession.Client = null;
  // is it address for a known DCSession?
  _dcSession = _session.DCSessions.Values.FirstOrDefault(dcs => dcs.EndPoint.Equals(endpoint));
  _dcSession ??= new() { Id = Helpers.RandomLong() };
  _dcSession.Client = this;
  Helpers.Log(2, $"Connecting to {endpoint}...");
  tcpClient = await TcpHandler(endpoint.Address.ToString(), endpoint.Port);
}
wiz0u commented 7 months ago

Thanks for your analysis... will study.

schmellow commented 7 months ago

Note: there are only 2 facts i see: 1) There is only one place where CreateAuthorizationKey is called 2) There is only one place where prerequisites for CreateAuthorizationKey call are fulfilled (the highlighted 3rd reconnection phase)

The rest are assumptions. I would love to get a solid confirmation, but Telegram infra failures are unpredictable, and repro would require one such failure. Maybe i should run patched library version with extra logging in suspicious places and wait for one...

schmellow commented 7 months ago

Yeah, i've deployed my app with in-tree patched copy of WTC and look at that - no 149.154.167.50 in alts

      Starting Telegram Client
03.02.2024 14:46:39 info: WTelegram.Client[0]
      Loaded previous session
03.02.2024 14:46:39 info: WTelegram.Client[0]
      Connecting to main endpoint 149.154.167.41:443...
03.02.2024 14:46:39 info: WTelegram.Client[0]
      Received 2 alt options:
03.02.2024 14:46:39 info: WTelegram.Client[0]
       - 149.154.167.41
03.02.2024 14:46:39 info: WTelegram.Client[0]
       - 2001:067c:04e8:f002:0000:0000:0000:000a

The "Received X alt options" logging is immediately after TLConfig = await this.InvokeWithLayer and setting fresh _session.DcOptions:

 var receivedOptions = _session.DcOptions.Where(dco => dco.id == _dcSession.DataCenter.id && dco.flags != _dcSession.DataCenter.flags
                 && (dco.flags & (DcOption.Flags.cdn | DcOption.Flags.tcpo_only | DcOption.Flags.media_only)) == 0)
     .ToList();
 Helpers.Log(2, $"Received {receivedOptions.Count} alt options:");
 foreach (var o in receivedOptions)
     Helpers.Log(2, $" - {o.ip_address}");
wiz0u commented 6 months ago

Try the fix in version 3.6.7-dev.5