status-im / status-desktop

Status Desktop client made in Nim & QML
https://status.app
Mozilla Public License 2.0
304 stars 79 forks source link

Right click - quit app makes the app not responding #5938

Closed anastasiyaig closed 1 year ago

anastasiyaig commented 2 years ago

Description

Cant quit the app by clicking right click on the icon and quit - the app hangs and is not responsive

Steps:

Screenshot 2022-05-31 at 11 17 09
endulab commented 2 years ago

StatusFoundation thread freezes when joinThread.

anastasiyaig commented 1 year ago

I can reproduce it now in master, seems @noeliaSD saw it too

jrainville commented 1 year ago

I could reproduce as well. I think it might be related to the store node fetch. I was seeing the fetch loader at the top and it wasn't going away. I tried closing the app to see if a reset would make it go away, and it hung. I know the fetch uses a Nim thread and those have been the source of most of not all our hangs, so I'd look into why the fetch sometimes doesn't end.

saledjenic commented 1 year ago

@jrainville that's a tricky thing with async tasks, how we're dealing with them in the app (maybe it's the only solution in Nim, but I am not sure and haven't ever dived into that to see if there is a better approach to implement that async mechanism). But in general what I want to say... such behaviour may happen if in the moment of closing app some job/s is/are being processed in a separate thread in that case the app goes to some busy state till they are done.

To have this solved, we need to check all our async tasks and somehow discard them we're closing the app, but I am not sure if that's possible, but at least what we can try to call teardown for thread pool and make tasks lighter.

But solving this properly requires some investigation in the current code and very likely Nim research for some general solution.

osmaczko commented 1 year ago

The problem here is that the app can be blocked by essentially any async task. Threadpool we use supports teardown by notifying all threads to stop its execution but it doesn't help if the pending task in a given thread is itself blocking. Before the app is closed we wait for all the threads from threadpool to be joined.

We should structure our code in a way that allows us to cleanly stop all ongoing tasks. Unfortunately, since most of our tasks are purely status-go driven, we have no direct control over them.

I've just added a logging mechanism to determine tasks that are blocking, see: https://github.com/status-im/status-desktop/pull/8952

Example of blocking status-go endpoints which makes the app hang:

DBG 2023-01-04 14:30:22.883+01:00 [threadpool task thread] initiating task   topics="task-threadpool" tid=828937 file=threadpool.nim:92 messageType=AsyncRequestCommunityInfoTaskArg:ObjectType threadid=10 task="{\"$type\":\"AsyncRequestCommunityInfoTaskArg:ObjectType\",\"communityId\":\"0x032aa2439b14eb2caaf724223951da89de1530fbfa5d5a639b93b82cd5341713fd\",\"vptr\":93911783996512,\"slot\":\"asyncCommunityInfoLoaded\",\"tptr\":93911733012144}"
DBG 2023-01-04 14:30:42.586+01:00 [threadpool task thread] initiating task   topics="task-threadpool" tid=828938 file=threadpool.nim:92 messageType=RequestMoreMessagesTaskArg:ObjectType threadid=11 task="{\"$type\":\"RequestMoreMessagesTaskArg:ObjectType\",\"chatId\":\"mytest\",\"vptr\":93911786042256,\"slot\":\"mailserverSynced\",\"tptr\":93911733948512}"

INF 2023-01-04 14:30:55.305+01:00 shutting down...                           topics="status-app" tid=828914 file=nim_status_client.nim:145

ERR 2023-01-04 14:32:01.700+01:00 [threadpool task thread] exception         topics="task-threadpool" tid=828937 file=threadpool.nim:100 error="\nstatus-go error [methodName:wakuext_requestCommunityInfoFromMailserver, code:-32000, message:mailserver not available ]\n"
DBG 2023-01-04 14:32:01.700+01:00 [threadpool task thread] sending 'done' notice to pool topics="task-threadpool" tid=828937 file=threadpool.nim:105 threadid=10 task="{\"$type\":\"AsyncRequestCommunityInfoTaskArg:ObjectType\",\"communityId\":\"0x032aa2439b14eb2caaf724223951da89de1530fbfa5d5a639b93b82cd5341713fd\",\"vptr\":93911783996512,\"slot\":\"asyncCommunityInfoLoaded\",\"tptr\":93911733012144}"
DBG 2023-01-04 14:32:01.700+01:00 [threadpool task thread] sending 'done' notice to pool topics="task-threadpool" tid=828938 file=threadpool.nim:105 threadid=11 task="{\"$type\":\"RequestMoreMessagesTaskArg:ObjectType\",\"chatId\":\"mytest\",\"vptr\":93911786042256,\"slot\":\"mailserverSynced\",\"tptr\":93911733948512}"

To reproduce the above: 1) open the app and join any public chat 2) turn off the network connection 3) "fetch more messages" for given chat 4) exit the app

We can mitigate the issue by ensuring status-go endpoints are non-hanging and have relatively small timeouts. In this case, syncChatFromSyncedFrom and requestCommunityInfoFromMailserver should be fixed by setting a reasonable timeout, ideally less than 30sek.

jrainville commented 1 year ago

We can mitigate the issue by ensuring status-go endpoints are non-hanging and have relatively small timeouts. In this case, syncChatFromSyncedFrom and requestCommunityInfoFromMailserver should be fixed by setting a reasonable timeout, ideally less than 30sek.

I agree. As long as we send a valid exception when we timeout so the client has the opportunity to retry or show a good error message, a timeout is necessary.

I'm not sure how long those two calls are supposed to last usually, but a timeout of less than 30sec sounds reasonable. If you can, measure the time it takes in normal cases and let us know.

endulab commented 1 year ago

I was dealing with the problem when timer tasks blocked the application. That's why I introduced QTimers. AFAIK, there is no possibility of killing threads in Nim, also it is not a good solution. As you mentioned, if the task in thread is working, it is impossible to talk to it (eg. send some kind of shutdown signal). So, it is a good idea to fix the cause of the problem (the code which freezes), eg. by using timeout context or whatever it needs.

osmaczko commented 1 year ago

I agree. As long as we send a valid exception when we timeout so the client has the opportunity to retry or show a good error message, a timeout is necessary.

I agree. Auto-retry in case of timeout sounds ok. Although, I don't think showing a good error message should be part of this bugfix. At the moment, we are not showing any feedback to the user if mailserver fetch fails. We should definitely think about a better feedback mechanism for message fetching. I created this issue some time ago.

I'm not sure how long those two calls are supposed to last usually, but a timeout of less than 30sec sounds reasonable. If you can, measure the time it takes in normal cases and let us know.

INF 2023-01-04 21:36:49.371+01:00 Time taken                                 topics="mailservers-service" tid=2757719 file=service.nim:54 t="467 milliseconds, 699 microseconds, and 333 nanoseconds"
INF 2023-01-04 21:36:57.710+01:00 Time taken                                 topics="mailservers-service" tid=2757721 file=service.nim:54 t="119 milliseconds, 49 microseconds, and 597 nanoseconds"
INF 2023-01-04 21:37:05.847+01:00 Time taken                                 topics="mailservers-service" tid=2757722 file=service.nim:54 t="119 milliseconds, 772 microseconds, and 43 nanoseconds"
INF 2023-01-04 21:37:12.413+01:00 Time taken                                 topics="mailservers-service" tid=2757723 file=service.nim:54 t="132 milliseconds, 865 microseconds, and 426 nanoseconds"
INF 2023-01-04 21:37:24.774+01:00 Time taken                                 topics="mailservers-service" tid=2757725 file=service.nim:54 t="104 milliseconds, 923 microseconds, and 54 nanoseconds"
INF 2023-01-04 21:37:30.396+01:00 Time taken                                 topics="mailservers-service" tid=2757727 file=service.nim:54 t="107 milliseconds, 313 microseconds, and 950 nanoseconds"
INF 2023-01-04 21:37:37.800+01:00 Time taken                                 topics="mailservers-service" tid=2757714 file=service.nim:54 t="124 milliseconds, 504 microseconds, and 812 nanoseconds"
INF 2023-01-04 21:37:42.988+01:00 Time taken                                 topics="mailservers-service" tid=2757715 file=service.nim:54 t="110 milliseconds, 516 microseconds, and 199 nanoseconds"
INF 2023-01-04 21:37:50.245+01:00 Time taken                                 topics="mailservers-service" tid=2757713 file=service.nim:54 t="96 milliseconds, 637 microseconds, and 28 nanoseconds"
INF 2023-01-04 21:38:16.686+01:00 Time taken                                 topics="mailservers-service" tid=2757718 file=service.nim:54 t="12 seconds, 398 milliseconds, 853 microseconds, and 899 nanoseconds"
INF 2023-01-04 21:38:21.987+01:00 Time taken                                 topics="mailservers-service" tid=2757717 file=service.nim:54 t="140 milliseconds, 634 microseconds, and 21 nanoseconds"
INF 2023-01-04 21:38:31.454+01:00 Time taken                                 topics="mailservers-service" tid=2757719 file=service.nim:54 t="95 milliseconds, 779 microseconds, and 728 nanoseconds"
INF 2023-01-04 21:38:42.740+01:00 Time taken                                 topics="mailservers-service" tid=2757721 file=service.nim:54 t="98 milliseconds, 587 microseconds, and 651 nanoseconds"
INF 2023-01-04 21:38:48.752+01:00 Time taken                                 topics="mailservers-service" tid=2757722 file=service.nim:54 t="94 milliseconds, 408 microseconds, and 248 nanoseconds"
INF 2023-01-04 21:38:54.336+01:00 Time taken                                 topics="mailservers-service" tid=2757723 file=service.nim:54 t="91 milliseconds, 881 microseconds, and 233 nanoseconds"
INF 2023-01-04 21:39:00.981+01:00 Time taken                                 topics="mailservers-service" tid=2757724 file=service.nim:54 t="133 milliseconds, 514 microseconds, and 8 nanoseconds"
INF 2023-01-04 21:39:06.505+01:00 Time taken                                 topics="mailservers-service" tid=2757725 file=service.nim:54 t="105 milliseconds, 525 microseconds, and 182 nanoseconds"
INF 2023-01-04 21:39:13.289+01:00 Time taken                                 topics="mailservers-service" tid=2757727 file=service.nim:54 t="135 milliseconds, 522 microseconds, and 33 nanoseconds"
INF 2023-01-04 21:39:18.017+01:00 Time taken                                 topics="mailservers-service" tid=2757728 file=service.nim:54 t="111 milliseconds, 893 microseconds, and 82 nanoseconds"
INF 2023-01-04 21:39:26.471+01:00 Time taken                                 topics="mailservers-service" tid=2757715 file=service.nim:54 t="129 milliseconds, 427 microseconds, and 291 nanoseconds"
INF 2023-01-04 21:39:35.108+01:00 Time taken                                 topics="mailservers-service" tid=2757716 file=service.nim:54 t="134 milliseconds, 190 microseconds, and 783 nanoseconds"
INF 2023-01-04 21:39:39.380+01:00 Time taken                                 topics="mailservers-service" tid=2757718 file=service.nim:54 t="98 milliseconds, 89 microseconds, and 759 nanoseconds"
INF 2023-01-04 21:39:52.006+01:00 Time taken                                 topics="mailservers-service" tid=2757720 file=service.nim:54 t="168 milliseconds, 772 microseconds, and 266 nanoseconds"
INF 2023-01-04 21:39:58.849+01:00 Time taken                                 topics="mailservers-service" tid=2757721 file=service.nim:54 t="325 milliseconds, 392 microseconds, and 338 nanoseconds"
INF 2023-01-04 21:40:03.971+01:00 Time taken                                 topics="mailservers-service" tid=2757722 file=service.nim:54 t="94 milliseconds, 835 microseconds, and 413 nanoseconds"
INF 2023-01-04 21:40:10.346+01:00 Time taken                                 topics="mailservers-service" tid=2757723 file=service.nim:54 t="99 milliseconds, 669 microseconds, and 237 nanoseconds"

The longest it took for me in "normal cases" is ~12 seconds. Usually, it is below 1 second. Tested on public #test channel, it has a bunch of history.

jrainville commented 1 year ago

The longest it took for me in "normal cases" is ~12 seconds. Usually, it is below 1 second. Tested on public #test channel, it has a bunch of history.

Cool. Great job testing it.

What do you think made it take more than 10 seconds that one time? If it was just a fluke or a network issue, then I think setting a less than 10 second timeout is appropriate. If it's normal to take 12 seconds sometimes for big chunks, then maybe 15 seconds is ok.

osmaczko commented 1 year ago

What do you think made it take more than 10 seconds that one time? If it was just a fluke or a network issue, then I think setting a less than 10 second timeout is appropriate. If it's normal to take 12 seconds sometimes for big chunks, then maybe 15 seconds is ok.

I repeated the procedure a few times. The procedure is: 1) create new account 2) join #test channel 3) fetch all the history

Sometimes, there is one fetch that takes more than a few seconds. I don't see a strong correlation with chunk size there. I suppose it is related to network conditions.

RUN A

INF 2023-01-04 22:49:37.702+01:00 Time taken                                 topics="mailservers-service" tid=2780436 file=service.nim:54 t="933 milliseconds, 51 microseconds, and 272 nanoseconds"
INF 2023-01-04 22:50:02.257+01:00 Time taken                                 topics="mailservers-service" tid=2780439 file=service.nim:54 t="17 seconds, 83 milliseconds, 192 microseconds, and 187 nanoseconds"
INF 2023-01-04 22:50:02.505+01:00 Time taken                                 topics="mailservers-service" tid=2780440 file=service.nim:54 t="4 seconds, 308 milliseconds, 666 microseconds, and 925 nanoseconds"
INF 2023-01-04 22:50:08.947+01:00 Time taken                                 topics="mailservers-service" tid=2780441 file=service.nim:54 t="95 milliseconds, 100 microseconds, and 92 nanoseconds"
INF 2023-01-04 22:50:16.546+01:00 Time taken                                 topics="mailservers-service" tid=2780442 file=service.nim:54 t="91 milliseconds, 506 microseconds, and 447 nanoseconds"

RUN B

INF 2023-01-04 22:52:28.570+01:00 Time taken                                 topics="mailservers-service" tid=2782066 file=service.nim:54 t="2 seconds, 172 milliseconds, 94 microseconds, and 565 nanoseconds"
INF 2023-01-04 22:52:46.501+01:00 Time taken                                 topics="mailservers-service" tid=2782068 file=service.nim:54 t="9 seconds, 347 milliseconds, 38 microseconds, and 863 nanoseconds"
INF 2023-01-04 22:52:54.433+01:00 Time taken                                 topics="mailservers-service" tid=2782069 file=service.nim:54 t="454 milliseconds, 820 microseconds, and 350 nanoseconds"
INF 2023-01-04 22:53:01.282+01:00 Time taken                                 topics="mailservers-service" tid=2782070 file=service.nim:54 t="333 milliseconds, 987 microseconds, and 214 nanoseconds"

RUN C

INF 2023-01-04 22:55:36.511+01:00 Time taken                                 topics="mailservers-service" tid=2783826 file=service.nim:54 t="539 milliseconds, 772 microseconds, and 70 nanoseconds"
INF 2023-01-04 22:55:45.669+01:00 Time taken                                 topics="mailservers-service" tid=2783828 file=service.nim:54 t="143 milliseconds, 463 microseconds, and 241 nanoseconds"
INF 2023-01-04 22:55:53.036+01:00 Time taken                                 topics="mailservers-service" tid=2783829 file=service.nim:54 t="124 milliseconds, 294 microseconds, and 502 nanoseconds"
INF 2023-01-04 22:55:59.343+01:00 Time taken                                 topics="mailservers-service" tid=2783830 file=service.nim:54 t="111 milliseconds, 786 microseconds, and 305 nanoseconds"

RUN D

INF 2023-01-04 23:00:17.018+01:00 Time taken                                 topics="mailservers-service" tid=2786420 file=service.nim:54 t="633 milliseconds, 934 microseconds, and 684 nanoseconds"
INF 2023-01-04 23:00:27.270+01:00 Time taken                                 topics="mailservers-service" tid=2786422 file=service.nim:54 t="180 milliseconds, 446 microseconds, and 699 nanoseconds"
INF 2023-01-04 23:00:33.741+01:00 Time taken                                 topics="mailservers-service" tid=2786423 file=service.nim:54 t="231 milliseconds, 759 microseconds, and 550 nanoseconds"
INF 2023-01-04 23:00:40.169+01:00 Time taken                                 topics="mailservers-service" tid=2786424 file=service.nim:54 t="89 milliseconds, 253 microseconds, and 979 nanoseconds"

Since the app should be usable under bad network conditions, I would go for the highest reasonable timeout (30 seconds?) that won't block the user for long when closing the app.

jrainville commented 1 year ago

Since the app should be usable under bad network conditions, I would go for the highest reasonable timeout (30 seconds?) that won't block the user for long when closing the app.

Ok let's start with 30 seconds and see how it feels.

osmaczko commented 1 year ago

It seems wakuv2 is not respecting mailserver request timeout (currently set to 10 seconds). I created issue for that. Before I start working on it, I would love to hear @richard-ramos's opinion. Is there any technical limitation to not using timeouts for wakuv2 requests?

richard-ramos commented 1 year ago

No technical limitations. It was an oversight