Open moryakovdv opened 5 months ago
Hi, thanks for creating the issue. Could you provide a full thread dump, with the trace of all application threads?
@mcollovati, Marco, thanks for the answer. thread dump attached aedump-1.zip
It looks like that Atmosphere UUIDBroadcasterCache
gets stuck while processing a terminal operation (anyMatch()
) on a parallel stream.
private boolean hasMessage(String clientId, String messageId) {
ConcurrentLinkedQueue<CacheMessage> clientQueue = messages.get(clientId);
return clientQueue != null && clientQueue.parallelStream().anyMatch(m -> Objects.equals(m.getId(), messageId));
}
"default task-183" - Thread t@3646
java.lang.Thread.State: RUNNABLE
at java.base@11.0.22/java.util.concurrent.ConcurrentLinkedQueue$CLQSpliterator.trySplit(ConcurrentLinkedQueue.java:880)
at java.base@11.0.22/java.util.stream.AbstractShortCircuitTask.compute(AbstractShortCircuitTask.java:114)
at java.base@11.0.22/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at java.base@11.0.22/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base@11.0.22/java.util.concurrent.ForkJoinPool$WorkQueue.helpCC(ForkJoinPool.java:1115)
at java.base@11.0.22/java.util.concurrent.ForkJoinPool.externalHelpComplete(ForkJoinPool.java:1957)
at java.base@11.0.22/java.util.concurrent.ForkJoinTask.tryExternalHelp(ForkJoinTask.java:378)
at java.base@11.0.22/java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:323)
at java.base@11.0.22/java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:412)
at java.base@11.0.22/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:736)
at java.base@11.0.22/java.util.stream.MatchOps$MatchOp.evaluateParallel(MatchOps.java:242)
at java.base@11.0.22/java.util.stream.MatchOps$MatchOp.evaluateParallel(MatchOps.java:196)
at java.base@11.0.22/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233)
at java.base@11.0.22/java.util.stream.ReferencePipeline.anyMatch(ReferencePipeline.java:528)
at deployment.ROOT.war//org.atmosphere.cache.UUIDBroadcasterCache.hasMessage(UUIDBroadcasterCache.java:259)
at deployment.ROOT.war//org.atmosphere.cache.UUIDBroadcasterCache.addMessageIfNotExists(UUIDBroadcasterCache.java:207)
at deployment.ROOT.war//org.atmosphere.cache.UUIDBroadcasterCache.addToCache(UUIDBroadcasterCache.java:146)
at deployment.ROOT.war//com.vaadin.flow.server.communication.LongPollingCacheFilter.filter(LongPollingCacheFilter.java:102)
The call happens during the execution of a BroadcasterFilter
(LongPollingCacheFilter
); execution of every filter happens in a synchronized()
block that locks on the filter instance.
So, all other requests, are blocked by the broadcaster lock that is not released because of the pending stream processing operation.
That said, I can't say why the stream is stuck. It looks like the clientQueue
is continuously getting new elements :thinking:
Does your application perhaps perform a very high rate of push
invocation?
Thanks for investigation. Yes, app has high rate of async updates. I'm confused with Long polling. Push annotation is set to use WEBSOCKETs. Does it mean that for some client Atmosphere switches to Long polling due to, say, network latency?
LongPollingCacheFilter is always executed, but it performs actions only if the transport is long polling. So, as you said, it seems that the transport is switched to long polling
This issue seems similar https://github.com/Atmosphere/atmosphere/issues/2262
Yes, I saw that topic.
But, I have no ideas how to get rid of it.
It would be hard to reduce number of push
invocations.
Does it make sense to switch on PushMode=Manual
and delegate ui.push
to some queue?
You could maybe try to copy/paste the UUIDBroadcasterCache and rewrite the hasMessage method to perform the anyMatch on a copy of the list
Well... Is it sufficient to use
@BroadcasterCacheService public class MyBroadcasterCache implements BroadcasterCache {...}
from the docs
to use my implementation further?
Does it make sense to switch on PushMode=Manual and delegate ui.push to some queue?
I don't have an answer for this, sorry. It could help, but it could also only move the problem on a different layer. Anyway, it might be worth it to try. Queueing messages and perform less push calls could prevent the lock
Does it make sense to switch on PushMode=Manual and delegate ui.push to some queue?
I don't have an answer for this, sorry. It could help, but it could also only move the problem on a different layer. Anyway, it might be worth it to try. Queueing messages and perform less push calls could prevent the lock
Just thought this approach could switch parallel updates to serial.
Well... Is it sufficient to use
@BroadcasterCacheService public class MyBroadcasterCache implements BroadcasterCache {...}
from the docs to use my implementation further?
IIRC you have to set it with the servlet init patameter, otherwise Flow will force UUIDBroadcasterCache
IIRC you have to set it with the servlet init patameter, otherwise Flow will force UUIDBroadcasterCache
You are right, the following code in PushRequestHandler
class uses UUIDBroadcasterCache
static AtmosphereFramework initAtmosphere(final ServletConfig vaadinServletConfig) {
AtmosphereFramework atmosphere = new AtmosphereFramework(false, false) {
@Override
protected void analytics() {
// Overridden to disable version number check
}
@Override
public AtmosphereFramework addInitParameter(String name,
String value) {
if (vaadinServletConfig.getInitParameter(name) == null) {
super.addInitParameter(name, value);
}
return this;
}
};
atmosphere.addAtmosphereHandler("/*", new PushAtmosphereHandler());
atmosphere.addInitParameter(ApplicationConfig.BROADCASTER_CACHE,
UUIDBroadcasterCache.class.getName());
...
But I cannot find the proper way to force it use my CustomBroadcasterCache
.
I tried init-param
, context-param
in web.xml, @BroadcasterCacheService
annotation and the following code to make it work:
@ManagedBean
public class AtmosphereInitializer implements ServletContextInitializer {
@Override
public void onStartup(ServletContext servletContext) {
servletContext.setInitParameter("org.atmosphere.cpr.AtmosphereConfig.getInitParameter", "true");
servletContext.setInitParameter("org.atmosphere.cpr.broadcaster.shareableThreadPool", "true");
servletContext.setInitParameter("org.atmosphere.cpr.broadcaster.maxProcessingThreads", "8");
servletContext.setInitParameter("org.atmosphere.cpr.broadcasterCacheClass",
"com.kmp.market.CustomBroadcasterCache");
}
}
My class com.kmp.market.CustomBroadcasterCache
is completely ignored by PushRequestHandler
.
BTW, other instructions in the above method , e.g. servletContext.setInitParameter("org.atmosphere.cpr.broadcaster.maxProcessingThreads", "8");
work correctly.
Could you please show me the right way?
Perhaps, I am in a mess with Vaadin+SpringBoot configuration.
Thanks in advance.
I think you need to set the parameter to the Vaadin servlet. Take a look at this comment for a similar use case: https://github.com/vaadin/flow/issues/16664#issuecomment-1552890336
Anyway, I would also investigate why the push connection is downgraded to long polling
Finally, I got it. Playing with the code below CustomBroadcasterCache is linked to framework:
@Bean
BeanPostProcessor patchAtmosphereBroadcaster() {
return new BeanPostProcessor() {
@Override
public Object postProcessBeforeInitialization(Object bean, String beanName) throws BeansException {
if (bean instanceof ServletRegistrationBean<?>) {
ServletRegistrationBean<?> reg = (ServletRegistrationBean<?>) bean;
if (reg.getServlet() instanceof SpringServlet) {
reg.addInitParameter("org.atmosphere.cpr.AtmosphereConfig.getInitParameter", "true");
reg.addInitParameter("org.atmosphere.cpr.maxSchedulerThread", String.valueOf(maxSchedulerThread));
reg.addInitParameter("org.atmosphere.cpr.broadcaster.shareableThreadPool", "true");
reg.addInitParameter("org.atmosphere.cpr.broadcaster.maxProcessingThreads", String.valueOf(maxProcessingThreads));
reg.addInitParameter("org.atmosphere.cpr.broadcaster.maxAsyncWriteThreads", String.valueOf(maxAsyncWriteThreads));
reg.addInitParameter("org.atmosphere.cpr.broadcasterCacheClass", "com.kmp.market.CustomBroadcasterCache");
}
}
return bean;
}
};
}
Now some investigations:
private boolean hasMessage(String clientId, String messageId) {
....
int size = clientQueue.size();
if (size>0)
System.out.println(size);
...
Connection lost
So, probably we have a Long polling after emulated NGINX restart.
And more:
Simply put a breakpoint into hasMessage
and any UI will stuck with Vaadin loader with no response, even when that UI was connected without proxy.
And more: Simply put a breakpoint into
hasMessage
and any UI will stuck with Vaadin loader with no response, even when that UI was connected without proxy.
I think this is somehow expected. When pushing changes to the client, a VaadinSession
lock is held, so if you block execution in hasMessage
, any other access to VaadinSession
will wait for the VaadinSession
lock to be released.
The output counter becomes mad and shows the constantly increasing number of messages in the queue
This is probably because while Atmosphere is trying to push cached messages to the client, the application is constantly adding other new messages, making the queue never empty.
So, probably we have a Long polling after emulated NGINX restart.
You can check it in the browser network tab: if the web socket channel is closed, you might see HTTP push requests happen continuously.
Probably one of the questions here is: would it be possible to reconnect with websocket transport after a network failure, instead of falling back to long polling?
If you are confident that web socket will ALWAYS work for the application clients, you can set websocket
as fallback transport as well.
@Bean
VaadinServiceInitListener configureWebsocketFallbackForPush() {
return serviceEvent -> serviceEvent.getSource().addUIInitListener(uiEvent -> {
uiEvent.getUI().getPushConfiguration().setFallbackTransport(Transport.WEBSOCKET);
});
}
If you are confident that web socket will ALWAYS work for the application clients, you can set
websocket
as fallback transport as well.
Thanks for the suggestion. In what case the fallback again to websockets instead of long-polling may insult the application? Connection issues? Old browsers? Some security settings?
I would say in cases where the client may not be able to establish a web socket connection at all. Old browser could be a case. Probably also, corporate firewall and proxies may in some cases block some request. Or network connection timeouts. Without the long polling fallback, push will basically not work, and changes will be propagated to the client only as a result of a server interaction (e.g. a click on a button)
Hello. We made further investigation.
Now we got rid from complete application hangs by using some modifications in BroadcasterCache
but still we have performance issues with BroadcasterConfig
.
Please, have a look on points at the picture.
While investigating thread dumps I found that default BroadcasterConfig is stuck in filter()
in 'synchronized' block.
Probably old LongPollingCacheFilter object from previous session was not removed or renewed after client's reconnect.
I would appreciate your ideas about how to get rid of such situation.
PS: we have NGINX on stage and production as a front. the above bug occurs in both environments, no matter what timings (proxy_send, proxy_connect etc... ) are set on NGINXs
Description of the bug
Hi. In production environment UI's responsiveness degrades significantly after some time. WildFly 23+NGINX Vaadin 23.4.0. +Springboot Automatic PUSH via WEBSOCKET Default session duration 30min is set on Wildfly.
Application has rather fast (~400ms) async UI updates (all of them called within
ui.access
) After initial start everything works just fine.But after some time (hours or even days) each UI request (menu open, button pressing etc) starts to perform badly. Vaadin Loading bar starts to blink and even stuck eventually. Refreshing the UI's, Opening page in another browser don't work, so even NEW UIs perform this way.
Several thread dumps say that there is a blocking behavior on the same
lock
in some threads inAtmosphere
engine. See screenshots. Probably there is deadlock somewhere. Maybe such behavior occurs after session expiration or switching from websocket to long polling.Actually I don't know how can I do further investigation. In our Stage or Development environment everything works as expected. Timed out sessions die, new sessions work correctly
Any response will be appreciated!
Expected behavior
.
Minimal reproducible example
Hard to reproduce, see screenshots
Versions