signalapp / Signal-Desktop

A private messenger for Windows, macOS, and Linux.
https://signal.org/download
GNU Affero General Public License v3.0
14.52k stars 2.63k forks source link

Messages frequently fail to send for a long time #6373

Closed pet0ruk closed 1 year ago

pet0ruk commented 1 year ago

Bug Description

For the past few weeks I've been having an issue where messages fail to send on Signal Desktop Beta and will sometimes sit for hours "spinning" before sending. The error logs show a Failed to connect to the server but the weirdest parts are:

I've attached the log for my linked Android device below, there are errors around the same time but this device can always send messages.

Steps to Reproduce

Unclear how to reproduce reliably, but it does seem to happen towards the end of the day when Signal will have been open for 8+ hours, but unsure if that is related.

Separately I have Signal installed alongside Signal Beta and I am unable to link that due to exactly the same issue as in #6366 - again I am not sure if this is related or not.

Platform Info

Signal Version:

6.15.0-beta.1

Operating System:

Windows 10

Linked Device Version:

6.18.2

Link to Debug Log

Signal Desktop Beta - signal-desktop.zip Linked Android Device - https://debuglogs.org/android/6.18.2/dd437ee7ed396543d7434b6abf1af384c19b9e67be09a6c222c48de4f3c61e69

vRobM commented 1 year ago

Same issues here with the non-beta GA version 6.14 for win.

The message retry logic doesn't seem to kick in to complete messages when connectivity is re-established. This also blocks sending all future messages coming from this client for many hours. (back end infra problem? or websocket retry problem?)

It would be nice to have a message menu option to manually retry, yet it should be seamless and automatic.

indutny-signal commented 1 year ago

Hello, and thank you for the report. I see in your debug log that outgoing requests take more than 10 seconds to complete and we timeout on them. What's the latency like on your connection? Could you try running: ping chat.signal.org when the issue happens?

Thanks!

pet0ruk commented 1 year ago

@indutny-signal sure - the issue is actually happening right now and my connection is rock solid, and curl works fine (curl is done from WSL terminal but ping from Windows):

$ curl -v -k https://chat.signal.org/v2/keys
*   Trying 13.248.212.111:443...
* TCP_NODELAY set
* Connected to chat.signal.org (13.248.212.111) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.2 (IN), TLS handshake, Certificate (11):
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
* TLSv1.2 (IN), TLS handshake, Server finished (14):
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.2 (OUT), TLS handshake, Finished (20):
* TLSv1.2 (IN), TLS handshake, Finished (20):
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: [NONE]
*  start date: Oct 26 15:03:56 2022 GMT
*  expire date: Nov 26 20:52:41 2023 GMT
*  issuer: C=US; ST=California; L=Mountain View; O=Signal Messenger, LLC; CN=Signal Messenger
*  SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55d11466d8c0)
> GET /v2/keys HTTP/2
> Host: chat.signal.org
> user-agent: curl/7.68.0
> accept: */*
>
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
< HTTP/2 401
< date: Tue, 18 Apr 2023 15:09:45 GMT
< content-type: text/plain
< content-length: 49
< www-authenticate: Basic realm="realm"
< x-signal-timestamp: 1681830585690
<
* Connection #0 to host chat.signal.org left intact
$ ping chat.signal.org -n 60

Pinging chat.signal.org [2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf] with 32 bytes of data:
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=12ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=27ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=11ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=11ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=7ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms
Reply from 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf: time=6ms

Ping statistics for 2600:9000:a507:ab6d:4ce3:2f58:25d7:9cbf:
    Packets: Sent = 60, Received = 60, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 6ms, Maximum = 27ms, Average = 6ms
pet0ruk commented 1 year ago

Just for completeness, here is a request via PowerShell to show connectivity is working from Windows while the issue is ongoing:

$ Invoke-WebRequest -SkipCertificateCheck -Uri "https://chat.signal.org/v2/keys" -SkipHttpErrorCheck

StatusCode        : 401
StatusDescription : Unauthorized
Content           : Credentials are required to access this resource.
RawContent        : HTTP/1.1 401 Unauthorized
                    Date: Tue, 18 Apr 2023 15:28:21 GMT
                    Connection: keep-alive
                    WWW-Authenticate: Basic realm="realm"
                    X-Signal-Timestamp: 1681831701623
                    Content-Type: text/plain
                    Content-Leng…
Headers           : {[Date, System.String[]], [Connection, System.String[]], [WWW-Authenticate, System.String[]],
                    [X-Signal-Timestamp, System.String[]]…}
Images            : {}
InputFields       : {}
Links             : {}
RawContentLength  : 49
RelationLink      : {}
indutny-signal commented 1 year ago

Thank you! I presume the requests that you made from the terminal completed in way less than 5 seconds, does this sound right?

pet0ruk commented 1 year ago

Thank you! I presume the requests that you made from the terminal completed in way less than 5 seconds, does this sound right?

Correct it's essentially instant.

Running this nice PS script that ChatGPT made for me:

# Define variables
$url = "https://chat.signal.org/v2/keys" # Replace this with your desired URL
$totalRequests = 10
$sumResponseTimes = 0

# Run the loop
for ($i = 1; $i -le $totalRequests; $i++) {
    $startTime = Get-Date
    $response = Invoke-WebRequest -Uri $url -SkipCertificateCheck -SkipHttpErrorCheck
    $endTime = Get-Date
    $responseTime = ($endTime - $startTime).TotalMilliseconds
    $sumResponseTimes += $responseTime

    Write-Host "Request $i completed in $responseTime ms"
}

# Calculate and show the average response time
$averageResponseTime = $sumResponseTimes / $totalRequests
Write-Host "Average response time: $($averageResponseTime) ms"

I get:

Request 1 completed in 355.4649 ms
Request 2 completed in 370.0982 ms
Request 3 completed in 352.4992 ms
Request 4 completed in 359.9847 ms
Request 5 completed in 332.6136 ms
Request 6 completed in 344.6886 ms
Request 7 completed in 353.4776 ms
Request 8 completed in 361.3634 ms
Request 9 completed in 351.8398 ms
Request 10 completed in 332.0356 ms
Average response time: 351.40656 ms
vRobM commented 1 year ago

I have a much more intermittent connection and it doesn't look like the code is handling that gracefully on desktop.. iOS is fine.

Is there a difference in how that works from mobile to desktop?

pet0ruk commented 1 year ago

@vRobM your issue may be different - my connection is not intermittent at all and the Signal client is reporting timeouts when there should be none - my connectivity to chat.signal.org works fine.

vRobM commented 1 year ago

@vRobM your issue may be different - my connection is not intermittent at all and the Signal client is reporting timeouts when there should be none - my connectivity to chat.signal.org works fine.

No it's the same.. as Signal desktop often shows the yellow connection timeout overlay when the rest of the system and apps are just fine.

The point is the code needs to be as resilient as to handle mobile intermittent connections (worst case).

Your scenario and mine combined points to other issues with timeouts or even libraries which are not related to actual connection issues. So not point in investigating there. ;-)

pet0ruk commented 1 year ago

Agreed in general that retries on loss of connectivity could be improved, however I'm not sure the core issues are the same (maybe they are). For example, I basically never see the yellow connectivity box when I'm experiencing the issue. Do you also have the other issues in the initial report where you see typing notifications and receive all other replies just fine (and are able to send yourself Note to Self etc. without issue)?

Not saying they aren't the same issue just trying to make the life of Signal Desktop devs a little easier.

vRobM commented 1 year ago

Yes, it's all the same, it's just my outgoing messages that are stuck, all else keeps up with the rest of the flow just fine.

vRobM commented 1 year ago

Just got pushed the 6.15.0 update and the yellow banner doesn't flicker in any longer.

We'll see how the message queuing works.

Thanks for the quick fix @indutny-signal & team.

pet0ruk commented 1 year ago

Well - sounds like you will still have the same issue just without the yellow banner appearing (which is exactly how this issue is for me and I was on 6.15.0 already since I was on the Beta).

I'm now on 6.16.0-beta.1 but I don't see anything in the diff that would change this behaviour from 6.15.0-beta.1

vRobM commented 1 year ago

Yes, this morning my messages failed to send again. Seems to happen regularly after 10am PST.

Note to self from the desktop client worked in the am, however, that message on other clients shows with only one empty checkmark confirmation, not two like the desktop client shows.

Whatever that timeout is, it needs to be much shorter, and not be triggered so easily.

The yellow banner started to appear again intermittently this afternoon and even the note to self message is now queued.

dave15559546 commented 1 year ago

I have this issue too fwiw - win 10 - ver 6.15.0

pet0ruk commented 1 year ago

Yes, this morning my messages failed to send again. Seems to happen regularly after 10am PST.

Only just noticed you mentioned a time, that's quite interesting because for me it's very reliable until around 15:00 UTC or so which would be 07:00 or so Pacific, sometimes it's okay until a bit later for me but I have definitely noticed the issue doesn't happen until much later in my day.

dave15559546 commented 1 year ago

what a shame, I really just use signal because, until recently, the desktop + ios app were pretty rock solid

I don't really need the privacy of signal for my chats honestly if it means it's going to be flaky

majso commented 1 year ago

I am seeing exactly the same pattern, Desktop app is reliable until around 15:00 UTC, after that I am not able send messages maybe until 20:00 UTC.

ping chat.signal.org always works

pet0ruk commented 1 year ago

@indutny-signal any more info we can provide to help debug here? It seems like more and more people are seeing this and it's happening at a similar time of day as well.

dave15559546 commented 1 year ago

I'm just going to use telegram, it's much more polished and "private" enough for me

jlocovare commented 1 year ago

I am seeing this issue as well, along with #6366. After re-installing because of this failing to send issue, I can no long link back up to my signal account.

vRobM commented 1 year ago

This is still a daily issue, along with frequent yellow connection banner blips.

Another test I did today was to post a Note to Self from the desktop client and it is stuck not updating other clients. Posting a new message there from mobile updates both clients. Then deleting that new message from mobile via the desktop client, shows it "as deleted," however that update does not complete and fails update the mobile client.

Stuck for many hours that way.

What is the progress on tracking this down? It's been 3-4 weeks now.

Please advise with some expectations

dave15559546 commented 1 year ago

fwiw, the desktop client has been working fine the past few days for me? I'd suggest reinstalling it which might have fixed the issue for me

honestly though I'd use another more polished client if I could (as I'm not worried about privacy) but even telegram's ios client can't get notifications right and the ios client for signal seems pretty bulletproof unlike the desktop client

desktop gets no love anymore... anywhere

pet0ruk commented 1 year ago

I had exactly the same issue today as well with the Note to Self functionality being unusually stuck as well for a minute or two before it went through. Still a frustrating daily (100% consistent) issue for me as well.

Tronikelis commented 1 year ago

Happening to me too, 10 mins chatting non stop everything's fine, then suddenly it gets stuck on the single dotted circle, the behavior seems completely random to me 🤔

indutny-signal commented 1 year ago

I'm still very sorry about this. Could y'all include debug logs with your reports, please? They'd really help us to understand what's causing the issue for you! Thanks!

Tronikelis commented 1 year ago

@indutny-signal not sure if this is the culprit, but see the unhandled promise rejection on line 9060 and the error on line 13319

debuglog.txt

pet0ruk commented 1 year ago

I'm still very sorry about this. Could y'all include debug logs with your reports, please? They'd really help us to understand what's causing the issue for you! Thanks!

For me I could include more but it's identical debug logs as before, saying it can't connect but ping and curl etc. all totally fine to the endpoint.

markcottrell commented 1 year ago

I have had this issue consistently for months now. I thought I was going nut as no one else I know seemed to be hitting it.

Debug log attached, the issue happened around 2023-05-10T14:42. Looks like there's a failure to connect to the server in the log, but my network connection was solid when the message was sent.

debuglog.txt

pet0ruk commented 1 year ago

I have had this issue consistently for months now. I thought I was going nut as no one else I know seemed to be hitting it.

Debug log attached, the issue happened around 2023-05-10T14:42. Looks like there's a failure to connect to the server in the log, but my network connection was solid when the message was sent.

debuglog.txt

Exactly the same for me, had it for months and assumed it was a problem on my side as I could not find anyone else reporting the problem. It literally always happens around 14:00 - 17:00 UTC and then sticks for many hours.

jlocovare commented 1 year ago

I was able to re-install the desktop client after leaving it uninstalled for 2 weeks. Immediately back to the failing to send for several minutes bug. I have other Android and Desktop clients on other machines that are reporting no issues on the same network, only my client is having issues.

Seeing lots of errors about failing to connect to server... ping, curl, all having no issues connecting to chat.signal.org

ERROR 2023-05-12T16:00:21.949Z PUT (WS) https://chat.signal.org/v1/messages/[REDACTED]aea?story=false (unauth) 0 Error
ERROR 2023-05-12T16:00:21.950Z PUT (WS) https://chat.signal.org/v1/messages/[REDACTED]43d?story=false (unauth) 0 Error
INFO  2023-05-12T16:00:21.950Z conversation.queueJob(groupv2([REDACTED]Cr8=), conversationQueue/sendReaction): took 64019ms
INFO  2023-05-12T16:00:21.950Z conversation.queueJob(groupv2([REDACTED]Cr8=), maybeSetPendingUniversalTimer): was blocked for 63818ms
INFO  2023-05-12T16:00:21.951Z conversation.queueJob(groupv2([REDACTED]Cr8=), sendTypingMessage): was blocked for 58447ms
INFO  2023-05-12T16:00:21.951Z sendTypingMessage(groupv2([REDACTED]Cr8=)): sending false
INFO  2023-05-12T16:00:21.951Z Sender certificate service found a valid yes-E164 certificate in storage; skipping fetch
INFO  2023-05-12T16:00:21.951Z Sender certificate service found a valid yes-E164 certificate in storage; skipping fetch
INFO  2023-05-12T16:00:21.951Z conversation job queue, job ID [REDACTED]f35, attempt 1: 3 send error(s): Error: reaction did not fully send
    at sendReaction ([REDACTED]\resources\app.asar\preload.bundle.js:3349:60473)
    at async Coe.run ([REDACTED]\resources\app.asar\preload.bundle.js:3349:125900)
    at async [REDACTED]\resources\app.asar\preload.bundle.js:3345:29873
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:70:31021),SendMessageNetworkError: Failed to connect to the server, please check your network connection. (original: promiseAjax catch; code: 0)
    at new ReplayableError ([REDACTED]\resources\app.asar\preload.bundle.js:74:217)
    at new SendMessageNetworkError ([REDACTED]\resources\app.asar\preload.bundle.js:74:854)
    at [REDACTED]\resources\app.asar\preload.bundle.js:3349:166164
    at async FF.sendToIdentifier ([REDACTED]\resources\app.asar\preload.bundle.js:3349:171348)
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:74:642945)
    at async [REDACTED]\resources\app.asar\preload.bundle.js:74:642978
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:70:31021)
Original stack:
HTTPError: promiseAjax catch; code: 0
    at makeHTTPError ([REDACTED]\resources\app.asar\preload.bundle.js:3348:925)
    at _promiseAjax ([REDACTED]\resources\app.asar\preload.bundle.js:3348:19)
    at async _retryAjax ([REDACTED]\resources\app.asar\preload.bundle.js:3348:696)
    at async J ([REDACTED]\resources\app.asar\preload.bundle.js:3348:5821)
    at async Object.ze [as sendMessagesUnauth] ([REDACTED]\resources\app.asar\preload.bundle.js:3348:15015)
    at async FF.sendToIdentifier ([REDACTED]\resources\app.asar\preload.bundle.js:3349:171348)
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:74:642945)
    at async [REDACTED]\resources\app.asar\preload.bundle.js:74:642978
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:70:31021)
Original stack:
Error: Request timed out
    at [REDACTED]\resources\app.asar\preload.bundle.js:3346:364871
    at Timeout._onTimeout ([REDACTED]\resources\app.asar\preload.bundle.js:21:3312)
    at listOnTimeout (node:internal/timers:559:17)
    at process.processTimers (node:internal/timers:502:7)
Initial stack:
Error
    at _outerAjax ([REDACTED]\resources\app.asar\preload.bundle.js:3348:867)
    at J ([REDACTED]\resources\app.asar\preload.bundle.js:3348:5827)
    at Object.ze [as sendMessagesUnauth] ([REDACTED]\resources\app.asar\preload.bundle.js:3348:15021)
    at FF.transmitMessage ([REDACTED]\resources\app.asar\preload.bundle.js:3349:165866)
    at [REDACTED]\resources\app.asar\preload.bundle.js:3349:168322
    at async FF.sendToIdentifier ([REDACTED]\resources\app.asar\preload.bundle.js:3349:171348)
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:74:642945)
    at async [REDACTED]\resources\app.asar\preload.bundle.js:74:642978
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:70:31021)
Caused by: Failed to create or send message,SendMessageNetworkError: Failed to connect to the server, please check your network connection. (original: promiseAjax catch; code: 0)
    at new ReplayableError ([REDACTED]\resources\app.asar\preload.bundle.js:74:217)
    at new SendMessageNetworkError ([REDACTED]\resources\app.asar\preload.bundle.js:74:854)
    at [REDACTED]\resources\app.asar\preload.bundle.js:3349:166164
    at async FF.sendToIdentifier ([REDACTED]\resources\app.asar\preload.bundle.js:3349:171348)
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:74:642945)
    at async [REDACTED]\resources\app.asar\preload.bundle.js:74:642978
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:70:31021)
Original stack:
HTTPError: promiseAjax catch; code: 0
    at makeHTTPError ([REDACTED]\resources\app.asar\preload.bundle.js:3348:925)
    at _promiseAjax ([REDACTED]\resources\app.asar\preload.bundle.js:3348:19)
    at async _retryAjax ([REDACTED]\resources\app.asar\preload.bundle.js:3348:696)
    at async J ([REDACTED]\resources\app.asar\preload.bundle.js:3348:5821)
    at async Object.ze [as sendMessagesUnauth] ([REDACTED]\resources\app.asar\preload.bundle.js:3348:15015)
    at async FF.sendToIdentifier ([REDACTED]\resources\app.asar\preload.bundle.js:3349:171348)
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:74:642945)
    at async [REDACTED]\resources\app.asar\preload.bundle.js:74:642978
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:70:31021)
Original stack:
Error: Request timed out
    at [REDACTED]\resources\app.asar\preload.bundle.js:3346:364871
    at Timeout._onTimeout ([REDACTED]\resources\app.asar\preload.bundle.js:21:3312)
    at listOnTimeout (node:internal/timers:559:17)
    at process.processTimers (node:internal/timers:502:7)
Initial stack:
Error
    at _outerAjax ([REDACTED]\resources\app.asar\preload.bundle.js:3348:867)
    at J ([REDACTED]\resources\app.asar\preload.bundle.js:3348:5827)
    at Object.ze [as sendMessagesUnauth] ([REDACTED]\resources\app.asar\preload.bundle.js:3348:15021)
    at FF.transmitMessage ([REDACTED]\resources\app.asar\preload.bundle.js:3349:165866)
    at [REDACTED]\resources\app.asar\preload.bundle.js:3349:168322
    at async FF.sendToIdentifier ([REDACTED]\resources\app.asar\preload.bundle.js:3349:171348)
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:74:642945)
    at async [REDACTED]\resources\app.asar\preload.bundle.js:74:642978
    at async run ([REDACTED]\resources\app.asar\preload.bundle.js:70:31021)
Caused by: Failed to create or send message

debuglog.txt

vRobM commented 1 year ago

Yes, each morning, Signal Desktop is useless other than for monitoring messages as it can receive but not send.

30m ago I quit the app while it was doing this and restarted it. It took 20 minutes for the app to come up, stuck on the "loading messages" screen.

The connection issues continue with no change, until the afternoon (PST) where they magically disappear. All the while iOS and MacOS clients are fine with no issues.

@indutny-signal Which part of your infrastructure is compromised and failing consistently?

scottnonnenberg-signal commented 1 year ago

@vRobM In the mornings, is your windows machine coming back from suspend/sleep or are you booting from nothing? Is desktop also coming back from suspend/sleep, or is it starting afresh? Do you have the 'Open at computer login' option checked in settings? Does the behavior change if you disable that?

Maybe there's something failing in Windows+Electron+your network specific to that return from suspend/sleep scenario? Or something about starting up before windows is ready for network requests?

We can't repro this situation which seems reliable for you, so the more scientific you can be about what exactly makes it happen and what doesn't, the more likely we'll be able to do something about it! Thanks!

pet0ruk commented 1 year ago

@scottnonnenberg-signal I can't speak for @vRobM but I think what is suspicious is that I have the issue at pretty much the same time as everyone else, starting around 14:00 UTC. Except I'm in the UK, so for me that's mid to late afternoon and my computer has been on for 5 or 6 hours at that point. I boot from fresh each day no sleep or suspend.

It seems weird it could be an electron issue when all other aspects of the application except specifically message sending continue to work. However I agree it's a weird issue and on the face of it, it presents as client side, but with this many people reporting the same issue, and especially if others can report the time it starts for them, it seems like it could be server side or at least some strange mix of the two.

vRobM commented 1 year ago

@scottnonnenberg-signal No, it's on 24/7.

@pet0ruk Right, it seems like there is a good chunk of time that the infra side starts dropping client connections as some security or rate limit thresholds are reached.

I would hope this is instrumented and can be tracked from the server side.

It's odd that receiving works, but perhaps those are two different pipelines.

In any case there are two sides to every connection. So far the client side has apparently found no issues and recovers over time, so there's only one other option, server infra side.

Happy to help track this down via IP or other unique token if needed, just let us know where to drop the email.

jlocovare commented 1 year ago

Again having the problem this morning (10AM EST). No problems until after 14:00 UTC then I start getting server errors and messages/reactions stop sending. Attaching logs

debuglog.txt

dave15559546 commented 1 year ago

I don't care enough myself to do this but it might be interesting for those affected by this issue to try running the desktop app in a vm on the machine having the problem to see if it's being replicated there (esp depending on the network configuration of the vm - nat vs bridged)

vRobM commented 1 year ago

@scottnonnenberg-signal this is still happening.

Have you escalated the issue to start the infra side investigation?

trevor-signal commented 1 year ago

The next beta release will include some additional information in the debug logs to help us diagnose this issue.

pet0ruk commented 1 year ago

I'm now on the latest beta and so far today I am not having the issue when normally by now I would be. I'll continue monitoring but has anything changed that could have made this work? I see Electron was updated.

vRobM commented 1 year ago

Thanks for the update Peter. GA client still does of course.

pet0ruk commented 1 year ago

Same as yesterday for me, it is working when the previous version was not.

vRobM commented 1 year ago

Can you tell what changes were made to the beta other than what Trevor mentioned for extra logging?

pet0ruk commented 1 year ago

Can you tell what changes were made to the beta other than what Trevor mentioned for extra logging?

I'm just going off these:

https://github.com/signalapp/Signal-Desktop/releases/tag/v6.19.0-beta.1 https://github.com/signalapp/Signal-Desktop/compare/6.18.x...6.19.x

vRobM commented 1 year ago

Ah, looks like these two may have something to do with it: Assert presence of sent_at timestamp for receipts https://github.com/signalapp/Signal-Desktop/commit/9d1252ae19c70d4ef28e05497aa26967022c101f & Improved WS request timeout handling https://github.com/signalapp/Signal-Desktop/commit/73f4374e1e44b58b95525c9bff1aba8f76bd961f

pet0ruk commented 1 year ago

As an update - I have literally not had the issue once since updating to the latest beta, it's like old times again!

dave15559546 commented 1 year ago

maybe so, but can you paste images from the clipboard on the beta?

pet0ruk commented 1 year ago

maybe so, but can you paste images from the clipboard on the beta?

Nope

vRobM commented 1 year ago

maybe so, but can you paste images from the clipboard on the beta?

This works for me from the GA client. Depending on the size of the image, it may take a few seconds to convert and resize for messaging.

MrDOS commented 1 year ago

Although I did need to fight past #6366 to get the beta linked, it does seem to be consistently sending messages in a timely fashion. Thank you indutny-signal and trevor-signal for your efforts here!