jonasrotilli commented 1 year ago

Description of the bug

I am often getting the message in the log of Resynchronizing UI by client's request.

The full message is: Resynchronizing UI by client's request. The network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout

It's not connection, apparently it's session related.

I have on the same linux server running another Vaadin 8 application. They have different names, different ports, in different folders and are NGINX-mapped with different subdomains.

I have already evaluated the other problems related to the topic:

12640

There was no clear solution, someone raised some possibilities:

Browser, it's not in my case, it wouldn't happen so often.
HTTP proxy, I use NGINX for the subdomains, but I use other services like NODE, static HTML and never had a problem. NGINX was configured by default, without any additional configuration, it just throws the subdomain to port X.
Possibility of mixing sessions: it makes no sense, since I have very little load, it happens even with only 1 user logged in.

12173

In this problem the user uses long duration push. Not my case, I use simple, default @Push.

11645

In this problem as I understand it was the slow connection. It's not my case, everything is flying here.

12173

In this problem the user uses long duration push. Not my case, I use simple, default @Push.

10096

This is a very similar scenario. But there was no conclusion, the user closed without informing how the issue was resolved and if it was resolved.

9399

This one he solved by changing the server, it is difficult to assess what the problem was

Anyway, this problem is quite recurrent and should be better explained in the documentation. I downloaded the example available from the site, nothing out of the ordinary, little or no extra configuration.

Expected behavior

The expectation is that it doesn't lock the user's screen, it's terrible to have to ask him to refresh the page, because after it's broken it doesn't come back.

Minimal reproducible example

It's hard to simulate, because it doesn't always happen. The impression is that it happens after a while without changes on the page, but sometimes it happens right after logging in or during some slower operation.

Versions

Vaadin / Flow version: 23.2.0.alpha1
Java version: openjdk version "11.0.3" 2019-04-16 OpenJDK Runtime Environment (build 11.0.3+7-Ubuntu-1ubuntu218.10.1) OpenJDK 64-Bit Server VM (build 11.0.3+7-Ubuntu-1ubuntu218.10.1, mixed mode, sharing)
OS version: -Ubuntu
Browser version (if applicable): Chrome

jonasrotilli commented 1 year ago

Difficult. I increased the timeout on NGINX to 600 seconds and the problem continues.

tiagomartins91 commented 1 year ago

Same on Vaadin version 14.8.14

WARN com.vaadin.flow.server.communication.ServerRpcHandler [http-nio-8080-exec-3] Resynchronizing UI by client's request. Under normal operations this should not happen and may indicate a bug in Vaadin platform. If you see this message regularly please open a bug report at https://github.com/vaadin/flow/issues

jonasrotilli commented 1 year ago

tiagomartins91

Apparently no one from Vaadin is watching here.. Let's try to find the problem ourselves, try to see what we have in common. 1 - Do you have any other Vaadin application running on the same server? A: I do, but it's another folder, another version, nothing shared.

2 - Does it happen in development, when running with Intelij or similar? A: No.

3 - Does it happen in production? A: Yes, I stop the service and put a new version, first login most of the time happens, which disproves the theory that it's because of time without moving.

jonasrotilli commented 1 year ago

I was hoping to be a conflict between two applications. But it is not. I completely stopped the other application and started the new one, Vaadin 23.

And the same problem happened: Resynchronizing UI by client's request. The network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout

This is very bad, it is a very serious problem.

knoobie commented 1 year ago

I would suggest to post your nginx and push configuration.

jonasrotilli commented 1 year ago

Sugiro postar sua configuração nginx e push.

Push detault:

@Theme(value = "myapp")
@PWA(name = "upCampo", shortName = "upCampo")
@NpmPackage(value = "line-awesome", version = "1.3.0")
@Push

NGINX file:

server {

    server_name  novoportal.MYWEBSITE;

    location / {
        proxy_pass  http://127.0.0.1:1628;
    }

    listen [::]:443 ssl; # managed by Certbot
    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/novoportal.MYWEBSITE/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/novoportal.MYWEBSITE/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot

}
server {
    if ($host = novoportal.MYWEBSITE) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    listen 80;
    listen [::]:80;

    server_name  novoportal.MYWEBSITE;
    return 404; # managed by Certbot

   proxy_read_timeout 600;
   proxy_connect_timeout 600;
   proxy_send_timeout 600;

}

About the timeouts in NGINX, I added more time, it didn't make any difference with or without this part of the code:


   proxy_read_timeout 600;
   proxy_connect_timeout 600;
   proxy_send_timeout 600;

knoobie commented 1 year ago

I don't see anything related to push in the configuration. Cuba has a example for nginx that you could try: https://doc.cuba-platform.com/manual-latest/server_push_settings.html#server_push_settings_using_proxy - important is the part about upgrade

I'm more experienced with apache httpd and there it is a must have to configure websockets corretly to work in corporate networks.

jonasrotilli commented 1 year ago

Não vejo nada relacionado ao push na configuração. Cuba tem um exemplo para nginx que você pode tentar: https://doc.cuba-platform.com/manual-latest/server_push_settings.html#server_push_settings_using_proxy - importante é a parte sobreupgrade

Eu sou mais experiente com apache httpd e aí é necessário configurar websockets corretamente para trabalhar em redes corporativas.

I added your suggestion in NGINX. Good news: so far, the problem hasn't happened yet. I'll leave it running during the day and come back here to confirm if it worked or not.

Thank you very much!

jonasrotilli commented 1 year ago

I don't see anything related to push in the configuration. Cuba has a example for nginx that you could try: https://doc.cuba-platform.com/manual-latest/server_push_settings.html#server_push_settings_using_proxy - important is the part about upgrade

I'm more experienced with apache httpd and there it is a must have to configure websockets corretly to work in corporate networks.

Sorted out! Big help. I accessed the CUBA link and used the "location" part, it seems that there is something related to WebSocket support. Since putting it on, I haven't had any more problems. @knoobie Thank you so much for your help, it saved me from several nights sleep!

I share here my NGINX that I'm using for other users, and I won't close the ticket so that someone from Vaadin can evaluate if any improvement is needed in relation to the theme. I believe this could be in the documentation.

Here's the complete file:

server {

 server_name  mywebsite.com;

    location / {
        proxy_set_header X-Forwarded-Host $host;
        proxy_set_header X-Forwarded-Server $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_read_timeout     3600;
        proxy_connect_timeout  240;
        proxy_set_header Host $host;
        proxy_set_header X-RealIP $remote_addr;

        proxy_pass  http://127.0.0.1:PORT_EXIT_DO_YOUR_SPRINGBOOT;

        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
}

    listen [::]:443 ssl; # managed by Certbot (Certificate SSL)
    listen 443 ssl; # managed by Certbot (Certificate SSL)
    ssl_certificate /etc/letsencrypt/live/mywebsite.com/fullchain.pem; # managed by Certbot (Certificate SSL)
    ssl_certificate_key /etc/letsencrypt/live/mywebsite.com/privkey.pem; # managed by Certbot (Certificate SSL)
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot (Certificate SSL)
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot (Certificate SSL)

}
server {
    if ($host = mywebsite.com) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    listen 80;
    listen [::]:80;

    server_name  mywebsite.com;
    return 404; # managed by Certbot (Certificate SSL)

   proxy_read_timeout 600;
   proxy_connect_timeout 600;
   proxy_send_timeout 600;

}

knoobie commented 1 year ago

I believe this could be in the documentation.

How to configure a reverse proxy should be an important topic inside the docs (cc: @tarekoraby)

jonasrotilli commented 1 year ago

I have bad news. Monitoring since that day and it happened again. Less often than before the NGINX change but it is happening. Any other possible problems?

tiagomartins91 commented 1 year ago

Same here

jonasrotilli commented 1 year ago

That is terrible. We never had problems like this in the old Vaadin 8 application. Imagine the inconvenience this is causing the customer. This image makes me shiver every time I see it:

Please someone help us!

Remove the documentation topic, not only that, it's something very serious that needs to be investigated.

tarekoraby commented 1 year ago

I would personally try to obtain a debug log of nginx to try to understand what's happening.

jonasrotilli commented 1 year ago

Eu pessoalmente tentaria obter um log de depuração do nginx para tentar entender o que está acontecendo.

I am fully available to collect the data needed to resolve this issue.

Help me, how do I do this?

tarekoraby commented 1 year ago

I'm not an nginx expert, but I'd check the docs for instructions on that: https://nginx.org/en/docs/debugging_log.html.

tiagomartins91 commented 1 year ago

Any update? Still happens sometimes and needs to refresh the page for the application work

mshabarov commented 1 year ago

We are going to investigate it more closely in the upcoming development iteration.

kagian commented 1 year ago

I've managed to reproduce this issue almost consistently. There are a couple of interesting factors that lead into this. I have a pcap and a google chrome debug output and the code that generates the issue.

java.lang.UnsupportedOperationException: Unexpected message id from the client. Expected sync id: 9, got 10. more details logged on DEBUG level.

On the following above, I noticed that in both pcap and and the Google Developer network tab that only a sync for 8 and 10 were generated from the server.

In the PCAP I can see that the Websocket port actually generates a FIN packet in between of both the id 8 packet being generated and id 10. Further data however is still being sent on the socket, which is technically fine. I am unclear why the server thinks that Id 9 got generated. But in the specific instance I looked up the FIN was generated and might be related. As a general note, in reading about this issue in other places, it mentions network quality, and I agree to this fact. We're having latency around 300ms to the server and client and slightly high packet loss ~30%. As this is TCP however, it really shouldn't be impacting the order and number of packets being finally received by server and client as re-transmissions should end up succeeding.

For most of my code I am already using my UI code as ui.access(command). I was also using @Push(PushMode.AUTOMATIC). I am unclear if this is related.

I moved to using @Push(PushMode.MANUAL) with ui.push after the ui.access and I have not been able to reproduce this problem straight after.

I have a PCAP for this I would prefer to hand it over to someone at Vaadin directly as there is likely confidential data within the data.

My suspicion therefore is that the Push automated sync messages logic has some server side bug.

This was tested on version 22.0.2

knoobie commented 1 year ago

@kagian Please share it, if you wanna see steady progress in this topic.

flefebure commented 1 year ago

Our multi-years Vaadin 7->14 migration work just went in production.

During dev/tests period, we had occasionnally this problem, principally when remote working, with a bad network. But now we have more users, and users with various network configuration, we realize that the problem is more serious than expected. We speak a lot of NGinx in this thread, my feeling is that this problem is not reverse-proxy related. For sure we have an NGinx in front of the app, but we also access directly the tomcat port, and we can see the problem at the same frequency. My short term goal is to produce a small project that reproduces the problem, for the Vaadin team. I'll probably use one of those browser extensions that simulates a bad network.

This problem is very critical, and as someone said, it wasn't observable with the old 7/8 vaadin platform

TatuLund commented 1 year ago

We speak a lot of NGinx in this thread, my feeling is that this problem is not reverse-proxy related.

@flefebure, @kagian, @jonasrotilli The problem can be reverse-proxy related, but there are number of other possible causes as well. E.g. slow VPN, flaky Wifi / cellular network etc.

Also framework corner case bugs are a possibility, it is not long ago we introduced this fix, so I recommend to use the latest Vaadin 14 or 23 version, and observe if is more stable in your environment.

https://github.com/vaadin/flow/pull/13733

kagian commented 1 year ago

Just to clarify here, the protocol in use seemed to be TCP as far as I could see. There is also no proxy or additional rewrite components in the path. So for an end to end TCP session, there really should not be the possibility of loss without recovery. And as such it should not be possible to get an out of sequence or dropped packets (unless this really is in UDP, which didn't seem like it).

It really does look like some kind of state tracking issue in Vaading server side. I'll check in on the new version, however, the current change to moving over to manual push updates seems to work well and as a result has reduced the want to change this again on our side.

flefebure commented 1 year ago

@tatulund we just upgraded Vaadin 14.8.4->14.8.17 et Flow 2.7.11->2.7.20 Now we wait for users feedback [cross fingers]

jonasrotilli commented 1 year ago

To complete: It decreased a lot after I made the suggestions for changes in NGINX that I posted there at the beginning. But it didn't completely solve it. It started to happen more often. No changes to the version of either Vaadin or any changes to the server in question. I believe this needs further investigation.

tiagomartins91 commented 1 year ago

Any update about this?

It happens more often with the last release. It's impossible to deliver or update a vaadin application in production with this issue.

I have another application running vaadin 8 in production and this doesn't happen. Use the same NGINX reverse proxy and the same configuration, without any problem.

echarlus commented 1 year ago

I have the same issue and my customers are complaining. This occurs since I've upgraded from Vaadin 8 to Vaadin 23 (never happened on 8). I'm now running V23.2.2 and the problem is still there. The app is running on Tomcat 9 in Azure with an Azure AppGateway acting as a proxy. I see the log [http-nio-8080-exec-4] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout I've just increased the request-time-out on the Proxy to see if it changes something. I've also seen cases where, on the client side, I have the loading indicators that keeps resetting as if the page was beeing reloaded but, on the server side, I do not see any incoming request ... so it's as if the client side was looping on itself ...

tiagomartins91 commented 1 year ago

In my case, just stop the refresh lopping page if I force it to refresh.

echarlus commented 1 year ago

In my case, just stop the refresh lopping page if I force it to refresh.

yes this works most of the time. But asking a customer to do that is not an option ... I also encountered cases where I had to clear all the stored data (session etc) and restart navigation before I could reach the site again :(

echarlus commented 1 year ago

Any update about this?

It happens more often with the last release. It's impossible to deliver or update a vaadin application in production with this issue.

I have another application running vaadin 8 in production and this doesn't happen. Use the same NGINX reverse proxy and the same configuration, without any problem.

Agreed I never experienced this issue with Vaadin 8, with 23 it's happening very often and my customers are becoming angry. I hope the fix will be delivered quickly

caalador commented 1 year ago

Is there a chance to get client logs of when this happens? I would hope for some console warns and logs on client and server ids.

echarlus commented 1 year ago

13733

It's happening on production so client-side logging is minimal. I have the chrome dev console opened and if I manage to catch something I'll post it here. @mshabarov any chance that playing with the init param maxMessageSuspendTimeout can change the behavior or is it unrelated ?

echarlus commented 1 year ago

I'm getting this "violation" logs from time to time, since one of them mentions "readystatechange" I'm posting them here just in case : [Violation] 'readystatechange' handler took 356ms [Violation] Forced reflow while executing JavaScript took 152ms generated-flow-imports.2848e01a.js:6550 [Violation] 'setTimeout' handler took 51ms FlowBootstrap.0b77bed3.js:1 [Violation] 'setTimeout' handler took 54ms

caalador commented 1 year ago

I was hoping for Received resync message, Forced update of clientId ..., Updating client-to-server id ... or Server expects next client-to-server ... when the issue happens

echarlus commented 1 year ago

@caalador I've just experienced a weird behavior that I had already seen but this time I was able to gather some info. Not sure it's the same issue as the one we're dealing with here but I'm posting the logs anyway just in case. The client made a request to the server, the request took a long time to process and the Azure AppGateway sitting in between the client & server did reset the connection (timeout is set at 60s) it therefore replied with a 502 HTTP code. On the client side, I then started to see repeated attempts to query the sever but the chrome console continued displaying the 502 error, however on my server I saw no incoming requests (Tomcat logs do not show any external request during this time). After some times, I finally saw a "online" message displayed on the client and, on the server I saw a flow of requests with all led to the error INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13 Here's a screenshot of the chrome console and the corresponding part of both catalina.out and access.log of tomcat.

catalina.out

[http-nio-8080-exec-3] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-2] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-4] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-8] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-5] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-14] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-15] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-6] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-7] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-9] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-11] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 14, got: 13
[http-nio-8080-exec-3] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 15, got: 14
[http-nio-8080-exec-4] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 15, got: 14
[http-nio-8080-exec-5] INFO com.vaadin.flow.server.communication.ServerRpcHandler - Ignoring old duplicate message from the client. Expected: 15, got: 14
[http-nio-8080-exec-3] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout

access.log -> here you see a hole between 9:35 and 9:40 then until 9:53 (this corresponds to the period during which the client requests where showing 502 responses but the server did not receive anything)

10.1.0.5 - - [04/Oct/2022:09:35:36 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 64956
10.1.0.5 - - [04/Oct/2022:09:35:36 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 131
10.1.0.5 - - [04/Oct/2022:09:35:36 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 132
10.1.0.5 - - [04/Oct/2022:09:35:36 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 14712
10.1.0.5 - - [04/Oct/2022:09:35:36 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 132
10.1.0.4 - - [04/Oct/2022:09:39:48 +0200] "POST /?v-r=heartbeat&v-uiId=1 HTTP/1.1" 200 -
10.1.0.4 - - [04/Oct/2022:09:40:27 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.5 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=heartbeat&v-uiId=1 HTTP/1.1" 200 -
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.5 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=heartbeat&v-uiId=1 HTTP/1.1" 200 -
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 -
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37
10.1.0.4 - - [04/Oct/2022:09:53:07 +0200] "POST /?v-r=uidl&v-uiId=1 HTTP/1.1" 200 37

Let me know if this helps.

echarlus commented 1 year ago

@caalador while playing with queries & trying to created issues I managed to get the following behavior :

Page takes a long time to display (as above)
Servers replies with 502
This time I got the message "connection lost trying to reconnect"
On the server I saw that after a while I got the exception caused by the proxy closing the connection
On the client it started to replay the same query which finally succeeded
The the page started to redraw indefinitely (but no query sent to server) and I got the following exception in the console :

FlowClient.947c8d40.js:1 Uncaught (in promise) Error: Client is resynchronizing
    at FlowClient.947c8d40.js:1:42744
    at Array.forEach (<anonymous>)
    at Zy (FlowClient.947c8d40.js:1:42719)
    at by (FlowClient.947c8d40.js:1:34159)
    at z.db (FlowClient.947c8d40.js:3:76660)
    at x (FlowClient.947c8d40.js:1:32329)
    at Map.forEach (<anonymous>)
    at i2 (FlowClient.947c8d40.js:1:43877)
    at W1 (FlowClient.947c8d40.js:3:50372)
    at mf (FlowClient.947c8d40.js:3:37591)
    at z.Bb (FlowClient.947c8d40.js:3:74625)
    at z.O (FlowClient.947c8d40.js:3:86799)
    at XMLHttpRequest.<anonymous> (FlowClient.947c8d40.js:1:25882)
    at od (FlowClient.947c8d40.js:1:16760)
    at Z2 (FlowClient.947c8d40.js:3:5692)
    at XMLHttpRequest.<anonymous> (FlowClient.947c8d40.js:1:28235)

Now it's looping forever ... Here's the query logs that led to this status :

catalina log is flooded with :

[http-nio-8080-exec-6] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-8] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-6] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-5] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-6] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-8] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-6] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-5] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-7] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-4] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-6] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-4] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-6] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-5] WARN com.vaadin.flow.server.communication.ServerRpcHandler - Resynchronizing UI by client's request. A network message was lost before reaching the client and the client is reloading the full UI state. This typically happens because of a bad network connection with packet loss or because of some part of the network infrastructure (load balancer, proxy) terminating a push (websocket or long-polling) connection. If you are using push with a proxy, make sure the push timeout is set to be smaller than the proxy connection timeout
[http-nio-8080-exec-2] WARN com.vaadin.flow.server.com

The only way to get out of this state is to force reload the page. This is on Vaadin 23.2.2

Artur- commented 1 year ago

Is there a way to reproduce this with e.g. an nginx proxy with a timeout and a simple Vaadin app that just sleeps in a click listener?

echarlus commented 1 year ago

I have a setting with a Microsoft Azure AppGateway that acts as a load balancer and behind that, I have tomcat-9 as an app server. I can reproduce by making a request timeout (just sleep on the server side) which causes the app-gateway to close the connection and return a 502 error. I'm sure you can easily reproduce with any proxy as long as you set the cnx timeout shorter than the servlet's response time.

echarlus commented 1 year ago

Hi @Artur- did you manage to reproduce ? any update ?

caalador commented 1 year ago

Haven't been able to reproduce the issue. All my setups to get it to fail have failed in one way or another.

echarlus commented 1 year ago

@caalador Is there anything I can do to help ?

caalador commented 1 year ago

Not having any proxy setups it took me a while to get a proxy to work.> If I have the timeout short enough I do get a 504 response without the server being done, but it doesn't send me into a request loop as such only 2 more of the same request are sent before the server is done with the initial thread. So as the client resends the xhr request again for a != 200 response error it will send it to the server until the original is done and the server responds with a 200 response.

I would expect the amount not sent to the server is just blocked by the proxy as coming too fast?

The application stops responding to any input after that which is the same that happens with #14470 and only a reload fixes the app.

So as a first help I would expect having the long event run in a thread releasing the request immediately and using push (or ui polling) for update after it is finished to fix the biggest problem as waiting over 60s for the server seems excessive.

I'll look into why the resynched page stops responding.

echarlus commented 1 year ago

@caalador thanks for the feedback. I know 60s is far too long for a response and I'm working on running the task in the background and using push. However, I've had many users complaining about the UI not responding to any click at some occasions (and they had not been getting the 60s timeout) so maybe fixing this would solve most of my problem. Did you manage to get the JS exception for which I posted a stack trace above ? If you need access to my app to do some investigation let me know, I can get you an account.

caalador commented 1 year ago

That one I didn't get, but that is probably due to the client having waiting promises when the resych is requested. So any pending promises will be rejected on a resychronization.

echarlus commented 1 year ago

@caalador I encountered the issue again this morning. Got a 502 on one page, then the whole site stopped responding. I tried to reload the page or go to the home page, nothing happened. On the server side I have to logs of any request coming in ... I see in the network view that the requests are being handled by sw.js and they all fail now. Could this be that sw.js is causing the issue (maybe that's what you're already looking at ...)

echarlus commented 1 year ago

I can confirm that sw.js seems to be causing the whole behavior (even blocking the home page display) after it received a first 502 response. In chrome I stopped sw.js and then restarted it and everything went back to normal ...

caalador commented 1 year ago

Found one issue that made it so that not all changes for the resync were executed if there was a missing node. For instance the notification in 14470 disappearing during the resync operation being the simplest reproducible one.

echarlus commented 1 year ago

Found one issue that made it so that not all changes for the resync were executed if there was a missing node. For instance the notification in 14470 disappearing during the resync operation being the simplest reproducible one.

great : ) @caalador is there a way I can get a version with the fix to test it on my end ?

caalador commented 1 year ago

You can grab a snapshot a while after the fix has merged using the pre-releases repository.

        <repository>
            <id>vaadin-prereleases</id>
            <url>
                https://maven.vaadin.com/vaadin-prereleases/
            </url>
        </repository>

I'll post when I can see a snapshot build for 23.2 containing the fix.

echarlus commented 1 year ago

@caalador thanks. Does the fix also take care of what I've seen on the client side : sw.js that seems to stop from working and blocking further connections - see above messages ?

vaadin / flow

Resynchronizing UI by client's request #14232

Description of the bug

12640

12173

11645

12173

10096

9399

Expected behavior

Minimal reproducible example

Versions

13733