jetty / jetty.project

Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more
https://eclipse.dev/jetty
Other
3.83k stars 1.91k forks source link

100% CPU usage in Selector using Jetty on Windows #2205

Closed joemokos closed 2 years ago

joemokos commented 6 years ago

I am running embedded jetty 9.4.8.v20171121. My app runs fine for a period of time, usually about 2 hours, with very little CPU usage. Then, 2 threads, both named WebSocketContainer@1861866092-, start to consume 100% CPU when idle. A stack trace at this point yields the following:

 Thread id32:WebSocketContainer@1861866092-32     RUNNABLE     55.84375s
 sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method)
 sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(WindowsSelectorImpl.java:296)
 sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(WindowsSelectorImpl.java:278)
 sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:159)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
 org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:375)
 org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:304)
 org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:179)
 org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:140)
 org.eclipse.jetty.io.ManagedSelector$$Lambda$23/663060787.run(Unknown Source)
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
 org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
 java.lang.Thread.run(Thread.java:748)

This seems very similar to: https://github.com/eclipse/jetty.project/issues/1446

but I am running a newer version of jetty.

I can only get this problem to occur on servers running Windows 2012R2. I am running java jdk1.8.0_162 although I have also seen the problem with jdk1.8.0_131.

Does anyone have any ideas what could be causing this?

normanmaurer commented 6 years ago

For the record it seems like someone in netty land also has the "same" problem... https://github.com/netty/netty/issues/7790#issuecomment-418830017

So far no luck on my side to reproduce tho :/

philallen2 commented 5 years ago

I get 100% processor use in ubuntu after a week or so.

It does not affect the performance so i put it down to garbage collection looking when it had the chance

To cure it i stop then restart the websocket server , i do that a couple of times per month

I was planning to hunt around to see if i had made a mistake somewhere but seeing as performance is excellent when 100% use i really think it is garbage collectiom taking up all idle time for a yet unknown to me reason, i will report back here if i find and fix the issue my end

Phil

sbordet commented 5 years ago

@philallen2 open a new issue, this is for Windows.

philallen2 commented 5 years ago

Lol just pointing out that it may well be garbage collection , ubuntu obviously survives better than windows , what a surprise, so the cure is use free ubuntu instead, simple

I have no issue, just trying to be helpful

joakime commented 5 years ago

@philallen2 your report of 100% CPU usage is useful, but the specific cause is still unknown. Since you have experienced it on ubuntu it would be a new issue, unrelated to this specific one (which is related to a JVM bug in the native selector code while on Windows).

Your specific situation might be a new cause/issue, one that hasn't been previously reported. Either way, if you want help diagnosing it, getting to the root cause, a new issue is warranted.

philallen2 commented 5 years ago

Sorry, i will not disturb again

I found this thread, and on the way to find it, read that the 100% processor use was causing issues, in windows

So i said what i said. I will keep my mouth shut

Other than i promise that if i find out what is causing it on a ubuntu virtual machine or 3, two production vms with 100s of users 24x7 have the issue, the test bed vm not, i will post here in case it gives you guys a helpful clue, never trust logs is my moral after programming since 1980

joakime commented 5 years ago

@philallen2 don't ever feel like you should not disturb us.

Issues exist for the squeaky wheel, the disturbing, the problematic, the odd, the bizarre, etc. We want to hear about them! Otherwise we can't address them, fix them, help others.

We just ask that it be for a new issue, to keep things organized and focused.

philallen2 commented 5 years ago

Me and my big mouth 🙁

My web socket server implemation using jetty and eclipse is about 30000 lines of code long, yes 30k

It allows me to get to a tcp ip server using web sockets, from a web page, even works in ie10

It was originally decided to not allow web sockets clients to connect to a tcp ip server, now it has been agreed to allow it, in the future, but will take years

It then emulates all the stuff i use in my old java applet way to get to a tcp ip server, hence the long code length, and it works perfectly, you guys did a great job, thanks

I do not know if it is a bug in my code, I cannot in all fairness raise a bug report when it might be my fault

And thank you for your replies, you guys here are much nicer than stack overflow is

😀

joakime commented 5 years ago

We are active on stackoverflow as well :-)

joemokos commented 5 years ago

jetty.txt jetty.1.txt jetty.2.txt

I realize is is somewhat of an old issue but I was finally able to get some logs where we can see a transition from a "sane" to spinning state and visa versa. The 3 logs are attached.

Looking at log jetty.2.txt, thread qtp1410167320-21 (21) is spinning on the select from the beginning of the log until 10:56:59.994 when there is finally a socket to be read. This causes 21 to reenter a "sane" state.

Then at 10:57:00.011 it appears thread qtp1410167320-16 starts to enter the spinning state. It spins for about 800ms until it also has a socket to be read and, it too, reenters a "sane" state.

About this same time, 10:57:00.813, thread qtp1410167320-20 (20) seems to start spinning on the select. At 10:57:01.403, thread 20 has a socket to be read and processes the request. However, it does not enter a "sane: state. Instead, at 10:47:01.419 it seems to reenter the spinning on select state.

The files jetty.1.txt and jetty.txt show thread 20 continuing to spin on the select until the trace is shut down.

rickar commented 5 years ago

I ran into this same issue. For context, I am using embedded Jetty as a transparent reverse proxy in a corporate Windows 10 environment (which means a more complicated networking setup than usual including data loss prevention scanners, VPN, local firewall, proxy auto config scripts, Active Directory integration, etc.).

I was getting 100% CPU spikes after exactly 2 hours each time. I don't really have the time (or likely even the permissions on my machine) to figure out the root cause, but I did come up with a patch that fixes the issue for me.

Note that this is not my area of expertise and is most likely a "dirty hack" but it is good enough in my situation which does not include production usage.

I made changes to org.eclipse.jetty.io.ManagedSelector At the top I added:

private int selectFailures = 0;

Then at the end of SelectorProducer.select():

                    // ...
                    _keys = selector.selectedKeys();
                    _cursor = _keys.isEmpty() ? Collections.emptyIterator() : _keys.iterator();

                    if (LOG.isDebugEnabled())
                        LOG.debug("Selector {} processing {} keys, {} updates", selector, _keys.size(), updates);

                    // detect a select that had no effect (should have blocked instead)
                    if (selected == 0 && selector.isOpen() && _keys.size() == 0 && updates == 0) {
                          selectFailures++;
                          if (selectFailures >= 5 && System.getProperty("os.name").startsWith("Win")) {
                              LOG.warn("bad selector detected; reopening");
                              closeNoExceptions(_selector);
                              _selector = _selectorManager.newSelector();
                              selectFailures = 0;
                          }
                    } else {
                          selectFailures = 0;
                    }

                    return true;
                    // ...

Now I see the "bad selector detected" messages every couple of hours in the logs but everything seems to keep working.

gregw commented 5 years ago

Recreating the selector seams a bit draconian?!?! Don't you lose all the current connections? Wont that fail requests in progress?

Our current solution for this problem is just doing a selectNow() and then continuing:

    private static final boolean FORCE_SELECT_NOW;
    static
    {
        String property = System.getProperty("org.eclipse.jetty.io.forceSelectNow");
        if (property != null)
        {
            FORCE_SELECT_NOW = Boolean.parseBoolean(property);
        }
        else
        {
            property = System.getProperty("os.name");
            FORCE_SELECT_NOW = property != null && property.toLowerCase(Locale.ENGLISH).contains("windows");
        }
    }

...

                    int selected = selector.select();
                    if (selected == 0)
                    {
                        if (LOG.isDebugEnabled())
                            LOG.debug("Selector {} woken with none selected", selector);

                        if (Thread.interrupted() && !isRunning())
                            throw new ClosedSelectorException();

                        if (FORCE_SELECT_NOW)
                            selected = selector.selectNow();
                    }

Does that work for you?

Another alternative solution to try would be do sleep for 100ms and then continue. That would stop the busy loop from being too busy and it would be interesting to see if the selector ever recovers?

Note that it would also be good to raise this issue with Microsoft, as it definitely looks like a bug in the OS somewhere!

rickar commented 5 years ago

It is a bit drastic, which is one of the reasons I submitted it here rather than as a pull request. ;-)

From what I've seen so far no active connections are lost, though I haven't done a test under heavy load.

I am already using a version of Jetty that includes the selector.selectNow() fix and that did not help in my case; the code that I added is past the if (selected == 0) block. The only way I could find to get my app to recover was to recreate the selector.

I don't doubt that there is something wrong with my Windows setup. But from a quick reading of the OpenJDK source, it looked to me like there is also a bug in that code because the combination of Java and native code in WindowsSelectorImpl does not seem to check the error code from the underlying Windows select() call which can return error codes such as WSAENETDOWN that would indicate that there is something very wrong. In that case, it appears that the Java side of select() would keep returning 0 forever even if the selector itself is completely unusable. (Which matches the busy-wait behavior I seem to be seeing.)

Anyway, I didn't expect the code I pasted earlier to be incorporated into Jetty; I only intended to share my findings in case it helps someone else figure out the real issue.

joemokos commented 5 years ago

@rickar Thanks for posting your workaround.

I have also tested a version of Jetty that contains the selector.selectNow() and it did not solve the problem. I read somewhere that the problem is resolved with OpenJDK jdk11. I am in the process of testing now. Will post the results when available. It may be a while, though, since I can't recreate in my test environment. I have to wait on a customer who is experiencing the problem.

joakime commented 5 years ago

@joemokos if switching to JDK11, be aware of the following ... https://webtide.com/openjdk-11-and-tls-1-3-issues/

jjfrankovich commented 5 years ago

Does anyone know when this problem started, is It 9.4+ specific? We are still running 9.3.25.v20180904, are we susceptible to this problem?

joakime commented 5 years ago

@jjfrankovich yes, you are susceptible to the Windows JVM bug if using Jetty 9.3.x.

The original fix was done in commit 484280bac6956f9db0d98db60739f5368dec3691 (first appeared in release jetty-9.4.9.v20180320) Some subsequent fixes have been done in commit d02762140d6e13e1a07351b2d5cb6dc101a95419 (first appeared in release jetty-9.4.15.v20190215)

There have been no backports to the Jetty 9.3.x major version branch for this hack/workaround for the Windows JVM bug.

We've had reports that using OpenJDK 11.0.2+ has solved the issue for many individuals (pay attention to your TLS/1.3 settings) We've also had reports that the bug can sometimes manifest in bad network drivers on specific installations of Windows. We've also had reports that various software firewall products can cause of the bug.

hdfg159 commented 5 years ago

9.4.15.v20190215 version also has this problem

"qtp1108730163-19" Id=19 RUNNABLE
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method)
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(WindowsSelectorImpl.java:339)
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:167)
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124)
    -  locked sun.nio.ch.Util$2@3d0797c3
    -  locked sun.nio.ch.WindowsSelectorImpl@8d64e59
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:141)
    at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:466)
    at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:403)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
    at java.base@12.0.1-ojdkbuild/java.lang.Thread.run(Thread.java:835)

"qtp1108730163-38" Id=38 RUNNABLE
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl.resetWakeupSocket0(Native Method)
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl.resetWakeupSocket(WindowsSelectorImpl.java:489)
    -  locked java.lang.Object@2cba84a
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:182)
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124)
    -  locked sun.nio.ch.Util$2@68dd5352
    -  locked sun.nio.ch.WindowsSelectorImpl@33a48982
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:146)
    at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:476)
    at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:403)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
    at java.base@12.0.1-ojdkbuild/java.lang.Thread.run(Thread.java:835)

"qtp1108730163-24" Id=24 RUNNABLE
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl.resetWakeupSocket0(Native Method)
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl.resetWakeupSocket(WindowsSelectorImpl.java:489)
    -  locked java.lang.Object@6e5e7ea5
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:182)
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:124)
    -  locked sun.nio.ch.Util$2@34c477d2
    -  locked sun.nio.ch.WindowsSelectorImpl@42905054
    at java.base@12.0.1-ojdkbuild/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:141)
    at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:466)
    at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:403)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
    at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
    at java.base@12.0.1-ojdkbuild/java.lang.Thread.run(Thread.java:835)
sbordet commented 5 years ago

@hdfg159 so still present in JDK 12.0.1. Which of the 3 stack traces is spinning? Are you able to take a server dump (not a thread dump, see https://www.eclipse.org/jetty/documentation/9.4.x/jetty-dump-tool.html) when the spin happens?

hdfg159 commented 5 years ago

@sbordet
These files is required about "server dump"? https://github.com/hdfg159/temp/blob/master/jetty/dumpStart.txt https://github.com/hdfg159/temp/blob/master/jetty/dumpStop.txt

The problem of CPU usage has been bothering me for a long time,i has found this problem estimated 1-2 years ago.I don't know how to analyze this problem. This problem is currently found on the Windows platform, which not appear on the Linux.

Environment: OS Version: Windows 10 1803 Jetty Version: jetty-distribution-9.4.19.v20190610 JDK Version: https://github.com/ojdkbuild/ojdkbuild/releases

sbordet commented 5 years ago

@hdfg159 you have to set QueuedThreadPool.detailedDump=true and then take the server dump when the spin happens (you have taken this and start and stop). What's interesting for us is to look at the dump of the connections during the spin.

diffractious-zz commented 5 years ago

We also encountered this issue recently. Our frankenstein app uses Jetty 7.6.17.v20150415 (I know, I know), Netty 3.9.9, and our own homegrown NioSelector event loop. After matching the thread ID for the busy threads (obtained in Process Explorer) to the nid from the jstack thread dump, all three selectors (Jetty, Netty, and homegrown) were broken, spinning CPU-bound, right around the 2 hour mark after app startup. Also could not reproduce locally, only in our client's environment, which was Windows 10 1803. We could reproduce it on the clients' machines both with Oracle JDK 8 u202 and AdoptOpenJDK 11.0.4.

We worked around this by setting the Java system property org.jboss.netty.epollBugWorkaround=true (despite the name, still valid for non-epoll selectors), switched the Jetty NIO connector to the BIO connector (served only low-traffic endpoints), and implemented the rebuildSelector()-type functionality from Netty for our own event loop.

Later, we found out that the root cause was the iBoss Proxy Agent, and the workarounds we implemented were no longer necessary if we added 127.0.0.1 to the IPBypassList in the Windows Registry: HKLM/SOFTWARE/IBoss/IBSA/Parameters/IPBypassList.

Hope this helps someone!

diffractious-zz commented 5 years ago

Another example came up recently for us, and should assist in reproducing this problem for anyone interested.

HttpDebugger looks to be an application that uses Windows Layered Service Provider or Windows Filtering Platform to do proxy-less network interception. From the LSP wikipedia page:

Corruption issues

A major issue with LSPs is that any bugs in the LSP can cause applications to break. For example, an LSP that returns the wrong number of bytes sent through an interface can cause applications to go into an infinite loop while waiting for the network stack to indicate that data has been sent.

Another major common issue with LSPs was that if they were to be removed or unregistered improperly or if the LSP was buggy, it would result in corruption of the Winsock catalog in the registry, and the entire TCP/IP stack would break and the computer could no longer access the network.

Perhaps unrelated, but 2 hours is the default value for HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime.

kishorejangid commented 4 years ago

We also encountered this issue recently. Our frankenstein app uses Jetty 7.6.17.v20150415 (I know, I know), Netty 3.9.9, and our own homegrown NioSelector event loop. After matching the thread ID for the busy threads (obtained in Process Explorer) to the nid from the jstack thread dump, all three selectors (Jetty, Netty, and homegrown) were broken, spinning CPU-bound, right around the 2 hour mark after app startup. Also could not reproduce locally, only in our client's environment, which was Windows 10 1803. We could reproduce it on the clients' machines both with Oracle JDK 8 u202 and AdoptOpenJDK 11.0.4.

We worked around this by setting the Java system property org.jboss.netty.epollBugWorkaround=true (despite the name, still valid for non-epoll selectors), switched the Jetty NIO connector to the BIO connector (served only low-traffic endpoints), and implemented the rebuildSelector()-type functionality from Netty for our own event loop.

Later, we found out that the root cause was the iBoss Proxy Agent, and the workarounds we implemented were no longer necessary if we added 127.0.0.1 to the IPBypassList in the Windows Registry: HKLM/SOFTWARE/IBoss/IBSA/Parameters/IPBypassList.

Hope this helps someone!

@diffractious , This did helped us. I was facing CPU Spike issue with Tomcat 8.5 and Tomcat 9 on servers where iBoss Proxy Agent is installed. Was not able to fix this even after 100s of thread dumps until I came across your comment. I stopped the iBoss Proxy Agent server and the issue went away.

Were you able to find the root cause of it, Whats in iBoss Proxy Agent that cause the Tomcat CPU spike.

Thanks.

jtnord commented 4 years ago

I can reproduce what appears to be this issue fairly quickly by running the unit tests for one of our products (jenkins plugin) using 9.4.28.v20200408

Firewall is stock windows defender, only other thing that use network is OpenVPN (2.5 preview with wintun)[https://openvpn.net/community-downloads/#heading-11020] but this happens regardless of if OpenVPN is running or connected.

Java version is 1.8.0_221. Is there an OpenJDK bug to track follow?

or what information would be helpful to try and track this down?

gregw commented 4 years ago

From out point of view, a DEBUG log or multiple (5 or more) thread dumps from a spinning server would help confirm this is the problem.

Any chance of at least running the same test with java 11 or 13 to see if the problem repeats there?

harishmurali-hub commented 4 years ago

We have also hit this issue, not with jetty though. It is from wildfly server running in windows. Looks like this is a JVM issue in windows :

sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method) sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(WindowsSelectorImpl.java:296) sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(WindowsSelectorImpl.java:278) sun.nio.ch.WindowsSelectorImpl.doSelect(WindowsSelectorImpl.java:159) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101) org.xnio.nio.SelectorUtils.await(SelectorUtils.java:51) org.xnio.nio.NioSocketConduit.awaitReadable(NioSocketConduit.java:358) io.undertow.protocols.ssl.SslConduit.awaitReadable(SslConduit.java:312) org.xnio.conduits.AbstractSourceConduit.awaitReadable(AbstractSourceConduit.java:66) io.undertow.conduits.ReadDataStreamSourceConduit.awaitReadable(ReadDataStreamSourceConduit.java:101) io.undertow.conduits.FixedLengthStreamSourceConduit.awaitReadable(FixedLengthStreamSourceConduit.java:285) org.xnio.conduits.ConduitStreamSourceChannel.awaitReadable(ConduitStreamSourceChannel.java:151) io.undertow.channels.DetachableStreamSourceChannel.awaitReadable(DetachableStreamSourceChannel.java:77) io.undertow.server.HttpServerExchange$ReadDispatchChannel.awaitReadable(HttpServerExchange.java:2161) org.xnio.channels.Channels.readBlocking(Channels.java:295) io.undertow.servlet.spec.ServletInputStreamImpl.readIntoBuffer(ServletInputStreamImpl.java:184) io.undertow.servlet.spec.ServletInputStreamImpl.read(ServletInputStreamImpl.java:160) io.undertow.servlet.spec.ServletInputStreamImpl.read(ServletInputStreamImpl.java:147) com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) com.compellent.servlet.slvaf.serialization.protobuf.StateMachineProtoGen$drAfoDataMessage.[init](StateMachineProtoGen.java:1527) com.compellent.servlet.slvaf.serialization.protobuf.StateMachineProtoGen$drAfoDataMessage.[init](StateMachineProtoGen.java:1499) com.compellent.servlet.slvaf.serialization.protobuf.StateMachineProtoGen$drAfoDataMessage$1.parsePartialFrom(StateMachineProtoGen.java:1606) com.compellent.servlet.slvaf.serialization.protobuf.StateMachineProtoGen$drAfoDataMessage$1.parsePartialFrom(StateMachineProtoGen.java:1601) com.compellent.servlet.slvaf.serialization.protobuf.StateMachineProtoGen$drAfoDataMessage$Builder.mergeFrom(StateMachineProtoGen.java:2339) com.compellent.servlet.slvaf.serialization.protobuf.StateMachineProtoGen$drAfoDataMessage$Builder.mergeFrom(StateMachineProtoGen.java:2196) com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:116) com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:210) com.dell.service.content.ProtobufferMessageLiteBodyReader.readFrom(ProtobufferMessageLiteBodyReader.java:47) com.dell.service.content.ProtobufferMessageLiteBodyReader.readFrom(ProtobufferMessageLiteBodyReader.java:28) org.jboss.resteasy.core.interception.AbstractReaderInterceptorContext.readFrom(AbstractReaderInterceptorContext.java:66) org.jboss.resteasy.core.interception.ServerReaderInterceptorContext.readFrom(ServerReaderInterceptorContext.java:61) org.jboss.resteasy.core.interception.AbstractReaderInterceptorContext.proceed(AbstractReaderInterceptorContext.java:56) org.jboss.resteasy.security.doseta.DigitalVerificationInterceptor.aroundReadFrom(DigitalVerificationInterceptor.java:36) org.jboss.resteasy.core.interception.AbstractReaderInterceptorContext.proceed(AbstractReaderInterceptorContext.java:59) org.jboss.resteasy.core.MessageBodyParameterInjector.inject(MessageBodyParameterInjector.java:151) org.jboss.resteasy.core.MethodInjectorImpl.injectArguments(MethodInjectorImpl.java:92) org.jboss.resteasy.core.MethodInjectorImpl.invoke(MethodInjectorImpl.java:115) org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(ResourceMethodInvoker.java:295) org.jboss.resteasy.core.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:249) org.jboss.resteasy.core.ResourceLocatorInvoker.invokeOnTargetObject(ResourceLocatorInvoker.java:138) org.jboss.resteasy.core.ResourceLocatorInvoker.invoke(ResourceLocatorInvoker.java:101) org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:406) org.jboss.resteasy.core.SynchronousDispatcher.invoke(SynchronousDispatcher.java:213) org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(ServletContainerDispatcher.java:228) org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:56) org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(HttpServletDispatcher.java:51) javax.servlet.http.HttpServlet.service(HttpServlet.java:790) io.undertow.servlet.handlers.ServletHandler.handleRequest(ServletHandler.java:85) io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(ServletSecurityRoleHandler.java:62) io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(ServletDispatchingHandler.java:36) org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(SecurityContextAssociationHandler.java:78) io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(SSLInformationAssociationHandler.java:131) io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(ServletAuthenticationCallHandler.java:57) io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) io.undertow.security.handlers.AuthenticationConstraintHandler.handleRequest(AuthenticationConstraintHandler.java:53) io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(AbstractConfidentialityHandler.java:46) io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(ServletConfidentialityConstraintHandler.java:64) io.undertow.servlet.handlers.security.ServletSecurityConstraintHandler.handleRequest(ServletSecurityConstraintHandler.java:59) io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(AuthenticationMechanismsHandler.java:60) io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(CachedAuthenticatedSessionHandler.java:77) io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(NotificationReceiverHandler.java:50) io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(AbstractSecurityContextAssociationHandler.java:43) io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(JACCContextIdHandler.java:61) io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) org.wildfly.extension.undertow.security.jaspi.JASPICSecureResponseHandler.handleRequest(JASPICSecureResponseHandler.java:48) org.wildfly.extension.undertow.deployment.GlobalRequestControllerHandler.handleRequest(GlobalRequestControllerHandler.java:68) io.undertow.server.handlers.PredicateHandler.handleRequest(PredicateHandler.java:43) io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:292) io.undertow.servlet.handlers.ServletInitialHandler.access$100(ServletInitialHandler.java:81) io.undertow.servlet.handlers.ServletInitialHandler$2.call(ServletInitialHandler.java:138) io.undertow.servlet.handlers.ServletInitialHandler$2.call(ServletInitialHandler.java:135) io.undertow.servlet.core.ServletRequestContextThreadSetupAction$1.call(ServletRequestContextThreadSetupAction.java:48) io.undertow.servlet.core.ContextClassLoaderSetupAction$1.call(ContextClassLoaderSetupAction.java:43) org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction.lambda$create$0(SecurityContextThreadSetupAction.java:105) org.wildfly.extension.undertow.security.SecurityContextThreadSetupAction$$Lambda$861/1990793599.call(Unknown Source) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$862/1110174823.call(Unknown Source) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$862/1110174823.call(Unknown Source) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$862/1110174823.call(Unknown Source) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$862/1110174823.call(Unknown Source) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction.lambda$create$0(UndertowDeploymentInfoService.java:1508) org.wildfly.extension.undertow.deployment.UndertowDeploymentInfoService$UndertowThreadSetupAction$$Lambda$862/1110174823.call(Unknown Source) io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:272) io.undertow.servlet.handlers.ServletInitialHandler.access$000(ServletInitialHandler.java:81) io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(ServletInitialHandler.java:104) io.undertow.server.Connectors.executeRootHandler(Connectors.java:326) io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:812) ... java.lang.Thread.run(Thread.java:745)
bhawani1978 commented 3 years ago

Hi @harishmurali-hub

Any update on this, I am also getting this in jetty with 9.4.21 as well with 9.4.31, seems it started somewhere last year. In our case works fine for 1-2 weeks and then suddenly it stops working with same stack trace.

"qtp525571-72617" Id=72617 RUNNABLE (in native) at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll0(Native Method) at sun.nio.ch.WindowsSelectorImpl$SubSelector.poll(Unknown Source) at sun.nio.ch.WindowsSelectorImpl$SubSelector.access$400(Unknown Source) at sun.nio.ch.WindowsSelectorImpl.doSelect(Unknown Source) at sun.nio.ch.SelectorImpl.lockAndDoSelect(Unknown Source)

joakime commented 3 years ago

@bhawani1978 that stacktrace shows a properly operating selector.

Turn on DEBUG logging for org.eclipse.jetty.io.ManagedSelector. Look for "woken with none selected" line in your logs. If you see that, then you have the selector bug on Windows. The root cause of which is something on your OS or your network drivers. It's a bug that neither Jetty nor the JVM can work around.

bhawani1978 commented 3 years ago

Hi @joakime,

Yes I can see "woken with none selected" in the debug logs. Its windows 2016 server. We have have similar installation at many customer sites but at this site only we see the issue.

If its infact a OS or network drivers issue, then let me know if have any pointer to pin point this or debug so can get a fix for this.

-Bhawani

joakime commented 3 years ago

I have no advice on upgrading your OS or network drivers. That's well out of scope for this issue tracker. You'll have to dig into that yourself. Know that you can now test it yourself. If you see "woken with none selected" that means you have a network implementation that is prone to waking up the application even when there is no network activity to operate against. This is the root cause of your 100% spin situation, and is not Java, JVM, or Jetty specific, it's a more fundamental issue on your machine.

bhawani1978 commented 3 years ago

Thanks @joakmine

diffractious-zz commented 3 years ago

The root cause of which is something on your OS or your network drivers. It's a bug that neither Jetty nor the JVM can work around.

Hi @joakime, I agree that it is an underlying problem with the OS, but there is a workaround that Jetty could do. Netty has functionality that, when it detects this spinning situation, will rebuild the selector. This is behind a system property, which Jetty could also do. See https://github.com/netty/netty/blob/3.9/src/main/java/org/jboss/netty/channel/socket/nio/AbstractNioSelector.java#L129.

joakime commented 3 years ago

@diffractious We've considered the Netty approach of rebuilding the selector several times, but you wind up losing all of the selector specific attachments when you do that. There's an attempt to retrieve the keys from the old selector, but that just fails with either selector errors, or an empty key list.

That impacts all of the active connections attached to that selector. Those impacted connections are essentially hosed, dead, and unable to progress.

We had rebuilding in the Jetty layer for spinning for a short while, but that impacted ALL other non-buggy environment usages far too harshly so it was removed.

joakime commented 3 years ago

@diffractious you can use a workaround ServerConnector to rebuild on spurious select 0.

See https://github.com/jetty-project/selector-hack

diffractious-zz commented 3 years ago

@joakime nice! Thank you!

joakime commented 3 years ago

One other update here.

We've received reports that running Jetty behind IIS and it's HttpPlatformHandler as a cause of this behavior/issue as well (apparently this setup uses some shortcuts / loopback network connector which triggers this bug on Windows).

horizonzy commented 3 years ago

maybe this is the reason. https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6778476

joakime commented 3 years ago

maybe this is the reason. https://bugs.java.com/bugdatabase/view_bug.do?bug_id=6778476

I don't think a selector.select() in a hung state would result in extra CPU usage.

horizonzy commented 3 years ago

I have no advice on upgrading your OS or network drivers. That's well out of scope for this issue tracker. You'll have to dig into that yourself. Know that you can now test it yourself. If you see "woken with none selected" that means you have a network implementation that is prone to waking up the application even when there is no network activity to operate against. This is the root cause of your 100% spin situation, and is not Java, JVM, or Jetty specific, it's a more fundamental issue on your machine.

thanks, I got it😁

horizonzy commented 3 years ago

Another example came up recently for us, and should assist in reproducing this problem for anyone interested.

HttpDebugger looks to be an application that uses Windows Layered Service Provider or Windows Filtering Platform to do proxy-less network interception. From the LSP wikipedia page:

Corruption issues

A major issue with LSPs is that any bugs in the LSP can cause applications to break. For example, an LSP that returns the wrong number of bytes sent through an interface can cause applications to go into an infinite loop while waiting for the network stack to indicate that data has been sent. Another major common issue with LSPs was that if they were to be removed or unregistered improperly or if the LSP was buggy, it would result in corruption of the Winsock catalog in the registry, and the entire TCP/IP stack would break and the computer could no longer access the network.

Perhaps unrelated, but 2 hours is the default value for HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\KeepAliveTime.

I find a quick way to reproduce it. just uninstal httpdebugger when your nio program is running, nio bug occur immediately. Now, oracle jdk team is tracing it. https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8255627

bhawani1978 commented 3 years ago

@diffractious you can use a workaround ServerConnector to rebuild on spurious select 0.

See https://github.com/jetty-project/selector-hack

Hi @joakime , we tried the workaround, it seems it worked in our labs. But when in production, it started breaking.

[2021-03-04 06:05:04.585+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.io.ManagedSelector] <##[{user=xxxxxx}]##> - Selector sun.nio.ch.WindowsSelectorImpl@5b1bf8a7 processing 0 keys, 1 updates
[2021-03-04 06:05:04.585+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.io.ManagedSelector] <##[{user=xxxxxx}]##> - updateable 1
[2021-03-04 06:05:04.585+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.io.ManagedSelector] <##[{user=xxxxxx}]##> - update org.eclipse.jetty.io.ChannelEndPoint$$Lambda$53/10032797@5b5b5da5
[2021-03-04 06:05:04.585+0000] [DEBUG] [qtp352359770-16419] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=3/8,p=0}@27cade01 waiting
[2021-03-04 06:05:04.586+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.io.ChannelEndPoint] <##[{user=xxxxxx}]##> - Key interests updated 0 -> 1 on SocketChannelEndPoint@2071aaaa{l=/127.0.0.1:29170,r=/127.0.0.1:50762,OPEN,fill=FI,flush=-,to=0/30000}{io=1/1,kio=1,kro=1}->HttpConnection@3d598344[p=HttpParser{s=START,0 of -1},g=HttpGenerator@514c9975{s=START}]=>HttpChannelOverHttp@7763d496{s=HttpChannelState@73b9c41d{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=363602,c=false/false,a=IDLE,uri=null,age=0}
[2021-03-04 06:05:04.586+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.io.ManagedSelector] <##[{user=xxxxxx}]##> - updates 0
[2021-03-04 06:05:04.586+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.io.ManagedSelector] <##[{user=xxxxxx}]##> - Selector sun.nio.ch.WindowsSelectorImpl@5b1bf8a7 waiting with 2 keys
[2021-03-04 08:02:04.918+0000] [DEBUG] [qtp352359770-16729] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=5/8,p=0}@629b0623 task=null
[2021-03-04 08:02:04.948+0000] [DEBUG] [qtp352359770-16729] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=5/8,p=0}@629b0623 IDLE
[2021-03-04 08:02:04.948+0000] [DEBUG] [qtp352359770-16729] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=4/8,p=0}@629b0623 Exited
[2021-03-04 08:02:04.948+0000] [DEBUG] [qtp352359770-16729] [org.eclipse.jetty.util.thread.QueuedThreadPool] <##[{user=xxxxxx}]##> - ran ReservedThreadExecutor@31edaa7d{s=4/8,p=0}@629b0623 in QueuedThreadPool[qtp352359770]@1500955a{STARTED,8<=9<=150,i=2,r=8,q=0}[ReservedThreadExecutor@31edaa7d{s=4/8,p=0}]
[2021-03-04 08:02:04.948+0000] [DEBUG] [qtp352359770-16729] [org.eclipse.jetty.util.thread.QueuedThreadPool] <##[{user=xxxxxx}]##> - shrinking QueuedThreadPool[qtp352359770]@1500955a{STARTED,8<=9<=150,i=3,r=8,q=0}[ReservedThreadExecutor@31edaa7d{s=4/8,p=0}]
[2021-03-04 08:02:04.948+0000] [DEBUG] [qtp352359770-16729] [org.eclipse.jetty.util.thread.QueuedThreadPool] <##[{user=xxxxxx}]##> - Thread[qtp352359770-16729,5,main] exited for QueuedThreadPool[qtp352359770]@1500955a{STARTED,8<=8<=150,i=2,r=8,q=0}[ReservedThreadExecutor@31edaa7d{s=4/8,p=0}]
[2021-03-04 08:02:14.268+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.IdleTimeout] <##[{user=xxxxxx}]##> - SocketChannelEndPoint@19103f98{l=/127.0.0.1:29170,r=/127.0.0.1:63307,OSHUT,fill=FI,flush=-,to=30001/30000}{io=0/1,kio=0,kro=0}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0} idle timeout check, elapsed: 30001 ms, remaining: -1 ms
[2021-03-04 08:02:14.268+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.IdleTimeout] <##[{user=xxxxxx}]##> - SocketChannelEndPoint@19103f98{l=/127.0.0.1:29170,r=/127.0.0.1:63307,OSHUT,fill=FI,flush=-,to=30002/30000}{io=0/1,kio=0,kro=0}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0} idle timeout expired
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.FillInterest] <##[{user=xxxxxx}]##> - onFail FillInterest@5e710602{AC.ReadCB@6b923e31{HttpConnection@6b923e31::SocketChannelEndPoint@19103f98{l=/127.0.0.1:29170,r=/127.0.0.1:63307,OSHUT,fill=FI,flush=-,to=30002/30000}{io=0/1,kio=0,kro=0}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}}}
java.util.concurrent.TimeoutException: Idle timeout expired: 30001/30000 ms
    at org.eclipse.jetty.io.IdleTimeout.checkIdleTimeout(IdleTimeout.java:171)
    at org.eclipse.jetty.io.IdleTimeout.idleCheck(IdleTimeout.java:113)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.AbstractConnection] <##[{user=xxxxxx}]##> - HttpConnection@6b923e31::SocketChannelEndPoint@19103f98{l=/127.0.0.1:29170,r=/127.0.0.1:63307,OSHUT,fill=-,flush=-,to=30002/30000}{io=0/1,kio=0,kro=0}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0} onFillInterestedFailed {}
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.AbstractEndPoint] <##[{user=xxxxxx}]##> - close SocketChannelEndPoint@19103f98{l=/127.0.0.1:29170,r=/127.0.0.1:63307,OSHUT,fill=-,flush=-,to=30002/30000}{io=0/1,kio=0,kro=0}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.AbstractEndPoint] <##[{user=xxxxxx}]##> - close(null) SocketChannelEndPoint@19103f98{l=/127.0.0.1:29170,r=/127.0.0.1:63307,OSHUT,fill=-,flush=-,to=30002/30000}{io=0/1,kio=0,kro=0}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.ChannelEndPoint] <##[{user=xxxxxx}]##> - doClose SocketChannelEndPoint@19103f98{l=/127.0.0.1:29170,r=/127.0.0.1:63307,CLOSED,fill=-,flush=-,to=30003/30000}{io=0/1,kio=0,kro=0}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.FillInterest] <##[{user=xxxxxx}]##> - onClose FillInterest@5e710602{null}
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.ManagedSelector] <##[{user=xxxxxx}]##> - Wakeup WorkaroundManagedSelector@2a7ed1f{STOPPING} id=1 keys=1 selected=0 updates=4
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.util.thread.QueuedThreadPool] <##[{user=xxxxxx}]##> - queue org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@1506f4ba startThread=0
[2021-03-04 08:02:14.269+0000] [DEBUG] [qtp352359770-16726] [org.eclipse.jetty.util.thread.QueuedThreadPool] <##[{user=xxxxxx}]##> - run org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@1506f4ba in QueuedThreadPool[qtp352359770]@1500955a{STARTED,8<=8<=150,i=1,r=8,q=0}[ReservedThreadExecutor@31edaa7d{s=4/8,p=0}]
[2021-03-04 08:02:14.269+0000] [DEBUG] [Connector-Scheduler-382db087-1] [org.eclipse.jetty.io.AbstractEndPoint] <##[{user=xxxxxx}]##> - Ignored idle endpoint SocketChannelEndPoint@19103f98{l=0.0.0.0/0.0.0.0:29170,r=/127.0.0.1:63307,CLOSED,fill=-,flush=-,to=30003/30000}{io=0/1,kio=-1,kro=-1}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}
[2021-03-04 08:02:14.269+0000] [DEBUG] [qtp352359770-16726] [org.eclipse.jetty.io.ManagedSelector] <##[{user=xxxxxx}]##> - Destroyed SocketChannelEndPoint@19103f98{l=0.0.0.0/0.0.0.0:29170,r=/127.0.0.1:63307,CLOSED,fill=-,flush=-,to=30003/30000}{io=0/1,kio=-1,kro=-1}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}
[2021-03-04 08:02:14.269+0000] [DEBUG] [qtp352359770-16726] [org.eclipse.jetty.io.AbstractConnection] <##[{user=xxxxxx}]##> - onClose HttpConnection@6b923e31::SocketChannelEndPoint@19103f98{l=0.0.0.0/0.0.0.0:29170,r=/127.0.0.1:63307,CLOSED,fill=-,flush=-,to=0/30000}{io=0/1,kio=-1,kro=-1}->HttpConnection@6b923e31[p=HttpParser{s=CLOSE,0 of -1},g=HttpGenerator@6555dd33{s=START}]=>HttpChannelOverHttp@616a5541{s=HttpChannelState@2468d708{s=IDLE rs=BLOCKING os=OPEN is=IDLE awp=false se=false i=true al=0},r=0,c=false/false,a=IDLE,uri=null,age=0}
[2021-03-04 08:02:14.269+0000] [DEBUG] [qtp352359770-16726] [org.eclipse.jetty.util.thread.QueuedThreadPool] <##[{user=xxxxxx}]##> - ran org.eclipse.jetty.io.ManagedSelector$DestroyEndPoint@1506f4ba in QueuedThreadPool[qtp352359770]@1500955a{STARTED,8<=8<=150,i=1,r=8,q=0}[ReservedThreadExecutor@31edaa7d{s=4/8,p=0}]
[2021-03-04 08:02:24.949+0000] [DEBUG] [qtp352359770-16535] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=4/8,p=0}@a8ab953 task=null
[2021-03-04 08:02:24.949+0000] [DEBUG] [qtp352359770-16535] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=4/8,p=0}@a8ab953 IDLE
[2021-03-04 08:02:24.949+0000] [DEBUG] [qtp352359770-16535] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=3/8,p=0}@a8ab953 Exited
[2021-03-04 08:02:24.949+0000] [DEBUG] [qtp352359770-16535] [org.eclipse.jetty.util.thread.QueuedThreadPool] <##[{user=xxxxxx}]##> - ran ReservedThreadExecutor@31edaa7d{s=3/8,p=0}@a8ab953 in QueuedThreadPool[qtp352359770]@1500955a{STARTED,8<=8<=150,i=2,r=8,q=0}[ReservedThreadExecutor@31edaa7d{s=3/8,p=0}]
[2021-03-04 08:02:33.957+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=3/8,p=0}@3d5bee76 task=null
[2021-03-04 08:02:33.957+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=3/8,p=0}@3d5bee76 IDLE
[2021-03-04 08:02:33.957+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.util.thread.ReservedThreadExecutor] <##[{user=xxxxxx}]##> - ReservedThreadExecutor@31edaa7d{s=2/8,p=0}@3d5bee76 Exited
[2021-03-04 08:02:33.957+0000] [DEBUG] [qtp352359770-16190] [org.eclipse.jetty.util.thread.QueuedThreadPool] <##[{user=xxxxxx}]##> - ran ReservedThreadExecutor@31edaa7d{s=2/8,p=0}@3d5bee76 in QueuedThreadPool[qtp352359770]@1500955a{STARTED,8<=8<=150,i=3,r=8,q=0}[ReservedThreadExecutor@31edaa7d{s=2/8,p=0}]
sbordet commented 3 years ago

@bhawani1978 I don't see any problem with your stack trace. The select wakes up with 1 update. There is no spinning and I doubt any high CPU usage given the logs show 10+ second pauses between log lines -- the server seems completely idle.

If you have 100% CPU, it's something else.

bhawani1978 commented 3 years ago

@sbordet no high CPU but it just hangs and stops responding

sbordet commented 3 years ago

@bhawani1978 if you don't have 100% CPU please open a new issue, as this issue is not related to your problem.

bhawani1978 commented 3 years ago

@sbordet I thaught, this would be right place as this happening after applying workaround as provided by @joakime https://github.com/jetty-project/selector-hack if we remove this hack, we most probably will start getting old issue.

joakime commented 3 years ago

The hack just works around a hardware/driver/OS issue on your machine. An issue that all Java programs that use NIO on that machine will encounter. In other words, this issue is not Jetty specific.

The new issue you reported is something else that is unrelated to this issue. It is something new. And so far, we do not have enough details to even guess at the cause. Please open a new issue. Lets troubleshoot there.

TheGoesen commented 3 years ago

Hello, I can join into the chorus of "jetty works everywhere, except for that one customer where it eats all cpu". I have two questions:

  1. Has someone successfully fieldtested the selector-hack branch, or is it more a theory-craft?
  2. Would a workaround be possible by providing a custom ServerConnector that uses blocking io and a few extra threads instead of selectors? Is such a class already available? To me personal that seems like a better workaround, especially since according to oracle selectors "scale horrible" prior to java 17 anyway. https://bugs.openjdk.java.net/browse/JDK-8266382
joakime commented 3 years ago

Hello, I can join into the chorus of "jetty works everywhere, except for that one customer where it eats all cpu". I have two questions:

Try to gather as much information about their system. Network hardware they have, Network drivers they have, any kind of network monitoring tools they have active on that server, any kind of "security" services that they have on that server (such as virus scanning, firewalls, etc)

It usually comes down to a few things:

  1. They are using network hardware that has a flaw/bug (replacing with new version of same hardware fixes this scenario)
  2. They are using a network driver that has a flaw/bug (using an up to date version fixes this for many users)
  3. They have some kind of security software running that is impacting their behavior negatively. (disabling the security software temporarily and it starts behaving, then you can point at that security software as the reason for the bad behavior)
  1. Has someone successfully fieldtested the selector-hack branch, or is it more a theory-craft?

There is not enough / adequate testing for that selector-hack branch. It's such an incredibly rare scenario, we've personally never experienced it, nor have any of our commercial support clients.

  1. Would a workaround be possible by providing a custom ServerConnector that uses blocking io and a few extra threads instead of selectors? Is such a class already available? To me personal that seems like a better workaround,

ServerConnector is not responsible for how the connection is handled.

Jetty is 100% NIO based, there's no support for blocking I/O concepts, in order to accomplish this you would need to rewrite vast swathes of the IO and Threading layers across all of Jetty. Not a trivial task, and one we have no desire to do at this point in time.

especially since according to oracle selectors "scale horrible" prior to java 17 anyway. https://bugs.openjdk.java.net/browse/JDK-8266382

Be aware that Oracle is pushing their Loom thread model agenda with that comment. Loom isn't a silver bullet anyway.

Note that we have single instances of Jetty handling over 200,000 active connections across a variety of protocols (http/1, http/2, websocket, etc) just fine, we start to hit network interface bandwidth limits way before we hit any kind of selector limits.

sbordet commented 3 years ago

Note that we have single instances of Jetty handling over 200,000 active connections across a variety of protocols (http/1, http/2, websocket, etc) just fine

On Linux.