Open ghost opened 1 year ago
You have to finish reading from the HTTP stream. The ZipInputStream does not read to EOF. Just closing the request.getInputStream is insufficient in many cases, as it can leave data on the connection that is unread (eg: breaking HTTP/1 connection persistence in the process). What if the transfer-encoding was chunked? and the terminal chunk hasn't been read from the network buffers yet? What if the transfer-encoding final chunk hasn't even been sent by the client yet? What if the stream contained superflous data, unrelated to the Zip stream after the zip stream?
All your code is doing is iterating through the Zip directory/dictionary/index for entries, and reading the entry content. How this is done is by reading the entire stream until it finds the bytes indicating the start of directory section at the end of the stream, then reading each entry of the directory to know where in the previous parts of the stream the ZipEntry contents are located (and what size they are). So you are basically reading ALL of the file contents first, then slowly iterating through the dictionary, and then going backwards in the zip stream to each entry contents, once you reach an entry in the Zip directory section that cannot be represented by java ZipEntry it stops. You haven't actually read the whole stream yet if that last entry is something structural that cannot be represented by a java ZipEntry. This technique of ZipInputStream also means that large zip files can blow out your memory quite easily. Most organizations that deal with zip files will stream it all down to a temporary file on disk, then read/parse/iterate/validate from disk (which is far more memory friendly).
How do you fix this? You have to satisfy the basics of HTTP here. Even libraries that read / parse JSON and XML have this same issue. Every modern JSON and XML library have options to continue reading to EOF. They can even be configured to behave differently depending on what is found after the content. Eg: in many XML and JSON parsers there's an option to fail the document parse if any non-whitespace is discovered after the document.
See TODO in the code snippet below.
val in = request.getInputStream
val zip = new ZipInputStream(in)
var entry = zip.getNextEntry
while (entry != null)
{
val out = new CountingOutputStream(new NullOutputStream)
zip.transferTo(out)
out.close
System.err.println(entry.getName + ":" + out.bytesWritten)
entry = zip.getNextEntry
}
zip.close
// TODO: you need to finish reading from `in` here, the HTTP stream is not yet complete, just the ZIP content is.
// You have to choose a behavior here,
// eg: if there is content, do you fail the upload? or ignore it the extra content? warn the user?
// small amounts of content ok? large amounts of extra content show a problem?
If you don't want to do this, you can send the Zip stream in a multipart/form-data
, which puts the HTTP stream behavior requirement on the Server, not you.
@joakime actually I had a chat with @sbordet about this and I think we should probably do a consumeAvailable
operation in the close
method of our ContentSourceInputStream
class.
If this was just pure a jetty-12 handler, directly using our async APIs, (or even a Servlet) then we would make an attempt to read any unconsumed input and only abort the request/response if we were unable to do so.
So if there is never any blocking between the end of the last zip entry and then end of the stream, then that "fix" should help @silviobierman. However, if there sometimes in content that could block, then that fix will not help and @silviobierman should read to EOF after the last entry.
Thus to be safe, I think both should be done.
that consumeAll
should have limits. (size, time, etc)
we used to do this in the past, and nefarious actors abused it, they would just hit resources on the server that would respond quickly (usually a rejection of some kind, auth, invalid params, etc), and then the consumeAll
would kick in on the request, meaning the nefarious user could trickle in (or flood) the useless request body consuming resources for no good reason.
then there were examples of non-nefarious actors, but just bad requests (bad to the webapp, not jetty).
Example 1: Client is sending a huge 1GB file, but the webapp says nope, and responds quickly, not even attempting to read the request body. (this could be better handled by 100-continue behaviors, but hey, not everyone uses that)
Example 2: Client is sending a huge 1GB json document, but the webapp parsing for it fails about 100kb in, and sends a failure response.
we also don't want to combine consumeAll
with GzipHandler
in the wrong way (eg: zip bomb)
@joakime it should be a consumeAvailable, not a consumeAll. I.e. it will never block. It will just keep reading until null (a stall) or EOF. If it stalls before EOF, then it will fail the stream like it does now. We have to protect against a zip bomb regardless if it is the app or a util class that is reading the data.
Note that ultimately, HTTP/1.1 semantics puts us into consumeAvailable
semantics all the time. Typically we do that after the response has been sent, so that any abort is fatal to the connection, but seldom to the response. In this case, the close is happening in the handler, so any abort will affect the response... thus we should try what we can to avoid the abort... but not block.
While we should thinking about HTTP/1 behaviors, I note that this issue is opened against HTTP/2
We do consumeAvailable()
for every HTTP version, so the HTTP version is not relevant.
Sorry I replied to @joakime on the mailing list instead of here. Let me repeat here:
You talk about the code in the alternative servlet TestServlet2 that treats the POST body as a ZIP. As posted the code however uses TestServlet1 that simply uses input.transferTo to read and discard all bytes of the POST body without making any assumptions about the content. This results in the same lockup for any large enough POST with any type of data. I get the same result when posting the content of a 80M MP4 file.
So although I understand your point about reading the complete body when parsing input in some format while reading the request that is not the cause of the issue. I merely posted the second servlet because it demonstrates that the first part of the request body is read properly, printing out the name/size of several initial entries in a posted ZIP file and then stalling.
We are tracking some reports of whole JVM lockups on JDK 21 from various persons ATM. We even experienced this on our CI (in different versions of Jetty)
Can you test on JDK 20 or JDK 17 and see if you still have lockups?
I already was quite sure I saw this behavior before I upgraded my OpenJDK 20 to 21 a few days ago. But to be sure I just uninstalled 21 and installed 17. Same results with both JDK versions: lockup with my 12.0.1 test code and no issues with our application using Jetty 11.
Additional note: this is not an intermittent issue. Each POST fails 100% of the time if the body is large enough. Start server, run curl post command line, lockup.
@silviobierman Can you take some thread dumps during the stall?
Well, this is interesting. Over the weekend I upgraded my system to Fedora 39 (beta) and got new versions of OpenJDK 21, VSCode, SBT (the Scala build tool and debugger) and curl. Now the lockups with TestServlet1 (reading plain content bytes) and TestServlet2 (reading content with ZipInputStream) using the same test method/content can not be reproduced. They would occur consistently before and now these test cases work as expected. It is not caused by the new JDK only because reverting to OpenJDK17 still does not make the issue reappear.
One lockup still remains: trying to read a POST body as a multipart form-data (both with browser and curl as client). I traced the handling code to the
request.getParts
call and that never returns. I changed the standalone test code to add a MultipartConfigElement using ServletHolder.setMultipartConfig and changed the POST handling code to call request.getParts. Should I file a separate issue for that with the current test code and close this one? Or is it better to continue here?
@silviobierman interesting?!?!?
Continue here as we never got a good diagnosis, so it could be same problem just tickled a different way.
The current test code:
import java.io._
import java.util.zip._
import jakarta.servlet._
import jakarta.servlet.http._
import org.eclipse.jetty.alpn._
import org.eclipse.jetty.alpn.server._
import org.eclipse.jetty.http2._
import org.eclipse.jetty.http2.server._
import org.eclipse.jetty.http3._
import org.eclipse.jetty.http3.server._
import org.eclipse.jetty.util.resource._
import org.eclipse.jetty.security._
import org.eclipse.jetty.server._
import org.eclipse.jetty.server.handler.{ ContextHandler, ResourceHandler, DefaultHandler, ContextHandlerCollection }
import org.eclipse.jetty.ee10.servlet._
import org.eclipse.jetty.session.{ DefaultSessionIdManager, HouseKeeper }
import org.eclipse.jetty.util.security._
import org.eclipse.jetty.util.ssl._
import org.eclipse.jetty.util.thread._
import org.eclipse.jetty.server._
import org.eclipse.jetty.websocket.api._
import org.eclipse.jetty.ee10.websocket.server._
import org.eclipse.jetty.ee10.websocket.server.config._
import org.eclipse.jetty.http.HttpFields
class NullOutputStream extends OutputStream
{
def write(byte : Int) = ()
override def write(buff : Array[Byte]) = ()
override def write(buff : Array[Byte],start : Int,count : Int) = ()
}
class CountingOutputStream(wrapped : OutputStream) extends OutputStream
{
private var written : Long = 0
def bytesWritten = written
def write(byte : Int)
{
wrapped.write(byte)
written += 1
}
override def write(buffer : Array[Byte],offset : Int,bytes : Int)
{
wrapped.write(buffer,offset,bytes)
written += bytes
}
override def write(buffer : Array[Byte])
{
wrapped.write(buffer)
written += buffer.length
}
override def flush = wrapped.flush
override def close = wrapped.close
}
class TestServlet3 extends HttpServlet
{
override def doPost(request : HttpServletRequest,response : HttpServletResponse)
{
System.err.println(request.getRequestURL().toString)
var hdrs = request.getHeaderNames
while (hdrs.hasMoreElements)
{
val n = hdrs.nextElement
val v = request.getHeader(n)
System.err.println(n + "=" + v)
}
System.err.println("TRY GET PARTS")
val parts0 = request.getParts
System.err.println("SUCCEEDED GET PARTS")
val parts = parts0.toArray.toIndexedSeq.map(_.asInstanceOf[Part])
for (part <- parts)
{
val in = part.getInputStream
val out = new CountingOutputStream(new NullOutputStream)
in.transferTo(out)
in.close
out.close
System.err.println("PART1:" + part.getName() + "/" + out.bytesWritten + "/" + part.getSize())
}
val writer = response.getWriter
writer.println("Thank you")
writer.close
}
}
object JettyTest
{
def main(args: Array[String])
{
val pool = new QueuedThreadPool
val server = new Server(pool)
// Create connector
val keyStore = "xxx.jks"
val keyPassword = "xxx"
val sslContextFactory = new SslContextFactory.Server
sslContextFactory.setEndpointIdentificationAlgorithm("HTTPS")
sslContextFactory.setKeyStorePath(keyStore)
sslContextFactory.setKeyStorePassword(keyPassword)
sslContextFactory.setKeyManagerPassword(keyPassword)
sslContextFactory.setKeyStoreType("JKS")
val httpsConfig = new HttpConfiguration
httpsConfig.setSecureScheme("https")
httpsConfig.setSecurePort(8443)
val http1 = new HttpConnectionFactory(httpsConfig)
val http2 = new HTTP2ServerConnectionFactory(httpsConfig)
val alpn = new ALPNServerConnectionFactory
alpn.setDefaultProtocol(http1.getProtocol)
val ssl = new SslConnectionFactory(sslContextFactory,alpn.getProtocol)
val http2Connector = new ServerConnector(server,ssl,alpn,http2,http1)
http2Connector.setHost("0.0.0.0")
http2Connector.setPort(8443)
http2Connector.setIdleTimeout(1800000L)
server.addConnector(http2Connector)
// Create Servlet context
val servletContext = new ServletContextHandler
servletContext.setContextPath("/")
val holder = new ServletHolder(classOf[TestServlet3])
val mc = new MultipartConfigElement(System.getProperty("java.io.tmpdir"),500000000,500000000,500000000)
holder.getRegistration.setMultipartConfig(mc)
servletContext.addServlet(holder,"/*")
// Setup server
val ctxHandler = new ContextHandlerCollection
ctxHandler.setHandlers(servletContext)
server.setHandler(ctxHandler)
server.start
server.join
}
}
And this is the dump after posting Multipart form data containing as only part a ZIP-file of ~26Mb size.
]) for Server@50b472aa[provider=null,keyStore=file:///java_root/cert/all/all.jks,trustStore=null]
2023-09-18 15:24:15.148:INFO :oejs.AbstractConnector:main: Started ServerConnector@29647f75{SSL, (ssl, alpn, h2, http/1.1)}{0.0.0.0:8443}
2023-09-18 15:24:15.158:INFO :oejs.Server:main: Started oejs.Server@704d6e83{STARTING}[12.0.1,sto=0] @494ms
https://localhost.jambo-mobile.com:8443/
content-length=26299197
content-type=multipart/form-data; boundary=------------------------f97e8f2013496bd4
user-agent=curl/8.2.1
accept=*/*
TRY GET PARTS
2023-09-18 15:25:09
Full thread dump OpenJDK 64-Bit Server VM (21+35 mixed mode, sharing):
Threads class SMR info:
_java_thread_list=0x00007fe208155ed0, length=21, elements={
0x00007fe2cc028c20, 0x00007fe2cc16c7c0, 0x00007fe2cc16de40, 0x00007fe2cc16f940,
0x00007fe2cc170f80, 0x00007fe2cc172520, 0x00007fe2cc174060, 0x00007fe2cc175720,
0x00007fe2cc18b840, 0x00007fe2cc18f980, 0x00007fe2cc6579b0, 0x00007fe2cc6589a0,
0x00007fe2cc659a60, 0x00007fe2cc65ab10, 0x00007fe2cc65bc50, 0x00007fe2cc65d0e0,
0x00007fe2cc65e130, 0x00007fe2cc65f1d0, 0x00007fe1dc06a810, 0x00007fe1d0041ef0,
0x00007fe1d00379a0
}
"main" #1 [4993] prio=5 os_prio=0 cpu=472,67ms elapsed=54,57s tid=0x00007fe2cc028c20 nid=4993 waiting on condition [0x00007fe2d15fe000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21/Native Method)
- parking to wait for <0x000000070efb4570> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(java.base@21/LockSupport.java:371)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(java.base@21/AbstractQueuedSynchronizer.java:519)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21/ForkJoinPool.java:3780)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21/ForkJoinPool.java:3725)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21/AbstractQueuedSynchronizer.java:1707)
at org.eclipse.jetty.util.thread.AutoLock$WithCondition.await(AutoLock.java:126)
at org.eclipse.jetty.util.thread.QueuedThreadPool.join(QueuedThreadPool.java:822)
at org.eclipse.jetty.server.Server.join(Server.java:684)
at JettyTest$.main(Main.scala:138)
at JettyTest.main(Main.scala)
"Reference Handler" #9 [5001] daemon prio=10 os_prio=0 cpu=0,60ms elapsed=54,55s tid=0x00007fe2cc16c7c0 nid=5001 waiting on condition [0x00007fe2ad2a1000]
java.lang.Thread.State: RUNNABLE
at java.lang.ref.Reference.waitForReferencePendingList(java.base@21/Native Method)
at java.lang.ref.Reference.processPendingReferences(java.base@21/Reference.java:246)
at java.lang.ref.Reference$ReferenceHandler.run(java.base@21/Reference.java:208)
"Finalizer" #10 [5002] daemon prio=8 os_prio=0 cpu=0,15ms elapsed=54,55s tid=0x00007fe2cc16de40 nid=5002 in Object.wait() [0x00007fe2ad1a1000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait0(java.base@21/Native Method)
- waiting on <0x000000070ef9e9b8> (a java.lang.ref.NativeReferenceQueue$Lock)
at java.lang.Object.wait(java.base@21/Object.java:366)
at java.lang.Object.wait(java.base@21/Object.java:339)
at java.lang.ref.NativeReferenceQueue.await(java.base@21/NativeReferenceQueue.java:48)
at java.lang.ref.ReferenceQueue.remove0(java.base@21/ReferenceQueue.java:158)
at java.lang.ref.NativeReferenceQueue.remove(java.base@21/NativeReferenceQueue.java:89)
- locked <0x000000070ef9e9b8> (a java.lang.ref.NativeReferenceQueue$Lock)
at java.lang.ref.Finalizer$FinalizerThread.run(java.base@21/Finalizer.java:173)
"Signal Dispatcher" #11 [5003] daemon prio=9 os_prio=0 cpu=0,09ms elapsed=54,55s tid=0x00007fe2cc16f940 nid=5003 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Service Thread" #12 [5004] daemon prio=9 os_prio=0 cpu=0,21ms elapsed=54,55s tid=0x00007fe2cc170f80 nid=5004 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Monitor Deflation Thread" #13 [5005] daemon prio=9 os_prio=0 cpu=3,56ms elapsed=54,55s tid=0x00007fe2cc172520 nid=5005 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #14 [5006] daemon prio=9 os_prio=0 cpu=395,98ms elapsed=54,55s tid=0x00007fe2cc174060 nid=5006 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"C1 CompilerThread0" #17 [5007] daemon prio=9 os_prio=0 cpu=334,86ms elapsed=54,55s tid=0x00007fe2cc175720 nid=5007 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
No compile task
"Notification Thread" #18 [5008] daemon prio=9 os_prio=0 cpu=0,08ms elapsed=54,55s tid=0x00007fe2cc18b840 nid=5008 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Common-Cleaner" #19 [5009] daemon prio=8 os_prio=0 cpu=0,59ms elapsed=54,55s tid=0x00007fe2cc18f980 nid=5009 waiting on condition [0x00007fe2ac9d0000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21/Native Method)
- parking to wait for <0x000000070efaffd8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21/LockSupport.java:269)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(java.base@21/AbstractQueuedSynchronizer.java:1847)
at java.lang.ref.ReferenceQueue.await(java.base@21/ReferenceQueue.java:71)
at java.lang.ref.ReferenceQueue.remove0(java.base@21/ReferenceQueue.java:143)
at java.lang.ref.ReferenceQueue.remove(java.base@21/ReferenceQueue.java:218)
at jdk.internal.ref.CleanerImpl.run(java.base@21/CleanerImpl.java:140)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
at jdk.internal.misc.InnocuousThread.run(java.base@21/InnocuousThread.java:186)
"qtp1963387170-20" #20 [5013] prio=5 os_prio=0 cpu=0,46ms elapsed=54,30s tid=0x00007fe2cc6579b0 nid=5013 runnable [0x00007fe2ac4d4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait(java.base@21/Native Method)
at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@21/EPollSelectorImpl.java:121)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@21/SelectorImpl.java:130)
- locked <0x000000071c2000e8> (a sun.nio.ch.Util$2)
- locked <0x000000071c200098> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(java.base@21/SelectorImpl.java:147)
at org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:181)
at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:188)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:605)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:542)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:512)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)
at org.eclipse.jetty.io.ManagedSelector$$Lambda/0x00007fe2580aaef8.run(Unknown Source)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-21" #21 [5014] prio=5 os_prio=0 cpu=1,78ms elapsed=54,30s tid=0x00007fe2cc6589a0 nid=5014 runnable [0x00007fe2ac3d4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait(java.base@21/Native Method)
at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@21/EPollSelectorImpl.java:121)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@21/SelectorImpl.java:130)
- locked <0x000000071c2485a8> (a sun.nio.ch.Util$2)
- locked <0x000000071c248558> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(java.base@21/SelectorImpl.java:147)
at org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:181)
at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:188)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:605)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:542)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:512)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)
at org.eclipse.jetty.io.ManagedSelector$$Lambda/0x00007fe2580aaef8.run(Unknown Source)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-22" #22 [5015] prio=5 os_prio=0 cpu=0,10ms elapsed=54,30s tid=0x00007fe2cc659a60 nid=5015 runnable [0x00007fe2ac2d4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait(java.base@21/Native Method)
at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@21/EPollSelectorImpl.java:121)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@21/SelectorImpl.java:130)
- locked <0x000000071c256d20> (a sun.nio.ch.Util$2)
- locked <0x000000071c256cd0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(java.base@21/SelectorImpl.java:147)
at org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:181)
at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:188)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:605)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:542)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:512)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)
at org.eclipse.jetty.io.ManagedSelector$$Lambda/0x00007fe2580aaef8.run(Unknown Source)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-23" #23 [5016] prio=5 os_prio=0 cpu=0,14ms elapsed=54,30s tid=0x00007fe2cc65ab10 nid=5016 runnable [0x00007fe2ac1d4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPoll.wait(java.base@21/Native Method)
at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@21/EPollSelectorImpl.java:121)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@21/SelectorImpl.java:130)
- locked <0x000000071c22b730> (a sun.nio.ch.Util$2)
- locked <0x000000071c22b6e0> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(java.base@21/SelectorImpl.java:147)
at org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:181)
at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:188)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:605)
at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:542)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produceTask(AdaptiveExecutionStrategy.java:512)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:258)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)
at org.eclipse.jetty.io.ManagedSelector$$Lambda/0x00007fe2580aaef8.run(Unknown Source)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-24-acceptor-0@7d9f22fa-ServerConnector@29647f75{SSL, (ssl, alpn, h2, http/1.1)}{0.0.0.0:8443}" #24 [5017] prio=3 os_prio=0 cpu=2,40ms elapsed=54,30s tid=0x00007fe2cc65bc50 nid=5017 runnable [0x00007fe256ffe000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.accept(java.base@21/Native Method)
at sun.nio.ch.ServerSocketChannelImpl.implAccept(java.base@21/ServerSocketChannelImpl.java:433)
at sun.nio.ch.ServerSocketChannelImpl.accept(java.base@21/ServerSocketChannelImpl.java:399)
at org.eclipse.jetty.server.ServerConnector.accept(ServerConnector.java:397)
at org.eclipse.jetty.server.AbstractConnector$Acceptor.run(AbstractConnector.java:693)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-32" #32 [5025] prio=5 os_prio=0 cpu=10,78ms elapsed=54,30s tid=0x00007fe2cc65d0e0 nid=5025 waiting on condition [0x00007fe2567f7000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21/Native Method)
- parking to wait for <0x000000070efb0078> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21/LockSupport.java:269)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@21/SynchronousQueue.java:401)
at java.util.concurrent.SynchronousQueue.poll(java.base@21/SynchronousQueue.java:903)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:324)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:400)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-33" #33 [5026] prio=5 os_prio=0 cpu=36,39ms elapsed=54,30s tid=0x00007fe2cc65e130 nid=5026 waiting on condition [0x00007fe2566f7000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21/Native Method)
- parking to wait for <0x000000070efb0078> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21/LockSupport.java:269)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@21/SynchronousQueue.java:401)
at java.util.concurrent.SynchronousQueue.poll(java.base@21/SynchronousQueue.java:903)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:324)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:400)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-34" #34 [5027] prio=5 os_prio=0 cpu=115,10ms elapsed=54,30s tid=0x00007fe2cc65f1d0 nid=5027 waiting on condition [0x00007fe2565f6000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21/Native Method)
- parking to wait for <0x000000071d9d56b8> (a java.util.concurrent.CompletableFuture$Signaller)
at java.util.concurrent.locks.LockSupport.park(java.base@21/LockSupport.java:221)
at java.util.concurrent.CompletableFuture$Signaller.block(java.base@21/CompletableFuture.java:1864)
at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@21/ForkJoinPool.java:3780)
at java.util.concurrent.ForkJoinPool.managedBlock(java.base@21/ForkJoinPool.java:3725)
at java.util.concurrent.CompletableFuture.waitingGet(java.base@21/CompletableFuture.java:1898)
at java.util.concurrent.CompletableFuture.get(java.base@21/CompletableFuture.java:2072)
at org.eclipse.jetty.ee10.servlet.ServletApiRequest.getParts(ServletApiRequest.java:494)
at TestServlet3.doPost(Main.scala:76)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1608)
at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1541)
at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:885)
at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:490)
at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:463)
at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:811)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:147)
at org.eclipse.jetty.server.Handler$Wrapper.handle(Handler.java:712)
at org.eclipse.jetty.server.Server.handle(Server.java:176)
at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:648)
at org.eclipse.jetty.util.thread.Invocable$ReadyTask.run(Invocable.java:105)
at org.eclipse.jetty.http2.server.internal.HttpStreamOverHTTP2$1.run(HttpStreamOverHTTP2.java:129)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.produce(AdaptiveExecutionStrategy.java:195)
at org.eclipse.jetty.http2.HTTP2Connection.produce(HTTP2Connection.java:211)
at org.eclipse.jetty.http2.server.internal.HTTP2ServerConnection.onOpen(HTTP2ServerConnection.java:96)
at org.eclipse.jetty.io.AbstractEndPoint.upgrade(AbstractEndPoint.java:441)
at org.eclipse.jetty.server.NegotiatingServerConnection.onFillable(NegotiatingServerConnection.java:125)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at org.eclipse.jetty.io.ssl.SslConnection$SslEndPoint.onFillable(SslConnection.java:554)
at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:373)
at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:146)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"Scheduler-3447021-1" #35 [5098] prio=5 os_prio=0 cpu=0,10ms elapsed=16,55s tid=0x00007fe1dc06a810 nid=5098 waiting on condition [0x00007fe2ac8a3000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21/Native Method)
- parking to wait for <0x000000071c239f38> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21/LockSupport.java:269)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@21/AbstractQueuedSynchronizer.java:1758)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@21/ScheduledThreadPoolExecutor.java:1182)
at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(java.base@21/ScheduledThreadPoolExecutor.java:899)
at java.util.concurrent.ThreadPoolExecutor.getTask(java.base@21/ThreadPoolExecutor.java:1070)
at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@21/ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@21/ThreadPoolExecutor.java:642)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-36" #36 [5101] prio=5 os_prio=0 cpu=3,60ms elapsed=16,42s tid=0x00007fe1d0041ef0 nid=5101 waiting on condition [0x00007fe2563f7000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21/Native Method)
- parking to wait for <0x000000070efb0078> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21/LockSupport.java:269)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@21/SynchronousQueue.java:401)
at java.util.concurrent.SynchronousQueue.poll(java.base@21/SynchronousQueue.java:903)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:324)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:400)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:969)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1194)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1149)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"qtp1963387170-37" #37 [5102] prio=5 os_prio=0 cpu=0,38ms elapsed=16,42s tid=0x00007fe1d00379a0 nid=5102 waiting on condition [0x00007fe2562f7000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@21/Native Method)
- parking to wait for <0x000000070ef9ea00> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@21/LockSupport.java:269)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@21/AbstractQueuedSynchronizer.java:1758)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:219)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:1124)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1141)
at java.lang.Thread.runWith(java.base@21/Thread.java:1596)
at java.lang.Thread.run(java.base@21/Thread.java:1583)
"VM Thread" os_prio=0 cpu=4,02ms elapsed=54,56s tid=0x00007fe2cc15e6d0 nid=5000 runnable
"GC Thread#0" os_prio=0 cpu=3,82ms elapsed=54,57s tid=0x00007fe2cc08f4b0 nid=4994 runnable
"GC Thread#1" os_prio=0 cpu=2,76ms elapsed=54,30s tid=0x00007fe244008000 nid=5018 runnable
"GC Thread#2" os_prio=0 cpu=3,77ms elapsed=54,30s tid=0x00007fe244008a10 nid=5019 runnable
"GC Thread#3" os_prio=0 cpu=3,74ms elapsed=54,30s tid=0x00007fe244009420 nid=5020 runnable
"GC Thread#4" os_prio=0 cpu=3,79ms elapsed=54,30s tid=0x00007fe244009e30 nid=5021 runnable
"GC Thread#5" os_prio=0 cpu=3,80ms elapsed=54,30s tid=0x00007fe24400a860 nid=5022 runnable
"GC Thread#6" os_prio=0 cpu=3,80ms elapsed=54,30s tid=0x00007fe24400b2c0 nid=5023 runnable
"GC Thread#7" os_prio=0 cpu=2,41ms elapsed=54,30s tid=0x00007fe24400bd20 nid=5024 runnable
"G1 Main Marker" os_prio=0 cpu=0,03ms elapsed=54,57s tid=0x00007fe2cc09fb00 nid=4995 runnable
"G1 Conc#0" os_prio=0 cpu=0,02ms elapsed=54,57s tid=0x00007fe2cc0a0aa0 nid=4996 runnable
"G1 Refine#0" os_prio=0 cpu=0,05ms elapsed=54,57s tid=0x00007fe2cc129020 nid=4997 runnable
"G1 Service" os_prio=0 cpu=1,14ms elapsed=54,57s tid=0x00007fe2cc129fd0 nid=4998 runnable
"VM Periodic Task Thread" os_prio=0 cpu=20,70ms elapsed=54,56s tid=0x00007fe2cc1440b0 nid=4999 waiting on condition
JNI global refs: 15, weak refs: 0
Heap
garbage-first heap total 249856K, used 19700K [0x000000070ee00000, 0x0000000800000000)
region size 2048K, 7 young (14336K), 2 survivors (4096K)
Metaspace used 14443K, committed 14656K, reserved 1114112K
class space used 1640K, committed 1728K, reserved 1048576K
@silviobierman any chance of getting a server dump as well (see https://eclipse.dev/jetty/documentation/jetty-12/operations-guide/index.html#og-troubleshooting-dump) Also, any chance of making your actual multipart file available to us via some file sharing?
@lachlan-roberts can you glance at this one to see if you can see anything?
@silviobierman also any chance of a run with either full debugging turned on, or perhaps just org.eclipse.jetty.http.MultiPartFormData.LEVEL=DEBUG
and MultiPartFormData.io.LEVEL=DEBUG
@silviobierman sorry to pepper you with questions... but.... at the point that you call getParts, do you know the type of the request? Is it a org.eclipse.jetty.ee10.servlet.ServletApiRequest
?
Now just some thinking out loud...
I'm expecting @silviobierman to say that the request is-a ServletApiRequest
, because if it is not, then we would be wrapping the request input stream with a InputStreamContentSource
, which is a blocking source. Thus I'd expect to see some thread blocked in fillBufferFromInputStream
, which we don't.
But if it were, what thread should that be... ah it would be the main thread calling org.eclipse.jetty.ee10.servlet.ServletMultiPartFormData#from(jakarta.servlet.ServletRequest, java.lang.String).
So either we are in an async setup and the data is not coming in (or not demanded) OR we were in a blocking setup and somehow we returned from from
without reading the entire multi-part
@lachlan-roberts thoughts?
@gregw I think we can see from the thread dump that the request is a ServletApiRequest
and it is waiting on the CompletableFuture
.
So I suspect we read a null content chunk and then for some reason the demand callback was never called.
@gregw
I printed the actual class name of the request object which is org.eclipse.jetty.ee10.servlet.ServletApiRequest so you where right about that.
The command line I use for the POST request is:
curl -F "file=@/home/silvio/Downloads/20181121_105432.mp4" https://localhost.jambo-mobile.com:8443
but the MP4 (78M) can be swapped with any sufficiently big file. I get the same result with largish ZIP files (26M and 307M). The localhost.jambo-mobile.com is equivalent to 127.0.0.1.
I ran the test with -Dorg.eclipse.jetty.LEVEL=DEBUG and got quite some outp[ut whith I attached here. JettyTest.log
@gregw: I ran JMC but the Mbean tree does not show anything Jetty related. I am sure I need to do something to enable JMX but could not find anything on how to do that with embedded Jetty. Any pointers?
@silviobierman See https://eclipse.dev/jetty/documentation/jetty-12/operations-guide/index.html#og-jmx if you are using a standard deployment setup... which I'm guessing you are not. Embedded code you need to do:
Server server = new Server();
// ...
MBeanContainer mbContainer = new MBeanContainer(ManagementFactory.getPlatformMBeanServer());
server.addBean(mbContainer);
@lachlan-roberts Can you look at trying to make a reproducer? I'll have a look at the debug log.
I've plotted the HTTP/2 stream and session flow control windows from the debug log. It is pretty clear either jetty or the client is "leaking" window.
@lorban @sbordet might need you to look here, but I'll dig deeper too. @lachlan-roberts also can you review the chunk handling of the multipart parser to make sure all read chunks are correctly released (but is release the operation we trigger windows resize on? or is it consuming from the buffer?).
@lachlan-roberts @sbordet @lorban I think we have a problem! The MultiPart class has been written to avoid data copying, by retaining the chunks it is passed. However, the HTTP/2 implementation has been written to only see the data as "consumed" once it is fully released.
Thus is an app is retaining content, rather than copying it, it is never seen as consumed and the receive window is never opened. I think we have not seen this in testing as we tend to write large parts to disk rather than retain them in memory.
Since we wish to encourage retention rather than copying, we need to figure out a better consumption signal back to HTTP/2. Maybe any release should be sufficient to say the data has been consumed?
@silviobierman See https://eclipse.dev/jetty/documentation/jetty-12/operations-guide/index.html#og-jmx if you are using a standard deployment setup... which I'm guessing you are not. Embedded code you need to do:
Server server = new Server(); // ... MBeanContainer mbContainer = new MBeanContainer(ManagementFactory.getPlatformMBeanServer()); server.addBean(mbContainer);
@lachlan-roberts Can you look at trying to make a reproducer? I'll have a look at the debug log.
Yes, adding that to my embedding code works. I now see the Jetty beans in the tree. What info would you like me to collect from this? Or is this no longer necessary given your last post?
@silviobierman it is no longer necessary, but FYI and whilst you have it running, try invoking Server.dumpStdErr
or Server.dump
and have a look at the output. It's handy technique for any future debugger.
@gregw Got it. I tried it and now know how to provide such info in the future. Thanks.
Fixed by #10554.
I would like to reiterate this issue. Jetty12 has been in production use with us for almost two weeks now. Apart from some minor glitches everything seemed to work fine and general performance appears better than with Jetty11.
BUT: Ever since the upgrade about 50% of a dozen automated nightly uploads to our system existing of HTTP POSTS containing plain CSV data (text/csv) fails. It took some time for the users to notice the lack of new data and for us to find out where things are going wrong. These processes have been in place for years without similar issues.
Since the uploads come from several different external systems we currently have little details about the way they are performed. Each time the first step on the server side is reading the POST payload and writing it to a file. In half the cases this process blocks indefinitely. Stopping the server process many hours later results in IO exceptions that have stack traces linking to the processing application code.
We will add logging of the HTTP headers to find out more about the issue. It could be that some feeds use chunked encoding while others don't. Some might use HTTP/1.1 while some might use HTTP/2. Etc.
But since the processes are critical to our users we have little time to investigate this. We may have to revert to a Jetty11 based version within a day or two. I will report back when I have more info.
BTW: We are using 12.0.3.
@silviobierman can you please open a new issue and add more details if you can?
Is it a multipart request? Does it have zipping? Can you show the code that reads and writes to a file? Which HTTP version? Can we get stack traces of the indefinitely blocked threads? Etc.
Please report all that in the new issue.
@silviobierman Also a server dump would be very useful, so we can see what state those connections are in. Also stack traces of exactly where those reads are blocked.
Sorry for continuing on this issue but it will take me quite some time to collect all the info needed for opening a new issue, BUT:
Looking at the now logged headers revealed that all the blocking POSTs have
Expect: 100-continue
whereas the succesful ones don't. Is there something the application code should/could do when such a request comes in instead of simply attempting to read the payload like the application code does now?
The interesting thing is that doing this sometimes does result in part of the expected content but mostly blocks with zero bytes received.
@silviobierman are you using Servlet APIs in your application, or Jetty 12 Handler APIs?
This is Servlet API, Jetty-ee10, HTTP/2.
@silviobierman the way that 100-continues is meant to work is:
expect: 100-continue
, but does not send any bytes of the bodygetOutputStream
or similar, that will cause a 100 Continue
response to be sent100 Continue
response and starts sending the request bodySo the first question is: can you tell if a 100-Continue response is being sent?
I have logged the POST request headers at the start of the handling code. The 100-Continue header is certainly present in the problem cases. All subsequent logging is done after attempting to write the bytes to a temp-file (using transferTo on the request.getInputStream wrapped in a BufferedInputStream). In the cases where the 100-Continue is present the transferTo locks up and nothing else is logged.
Sorry for misreading, forget about my last response. I can not tell if a 100-Continue response is being sent. I can only see the headers coming in.
@silviobierman we are definitely going to need more information. We do have tests with 100-Continues over h2 and they pass fine. I have tried some variations on them, including pausing during the transmission of the body, and they still pass fine.
So it is something else plus 100 Continues that is causing the failure.
Do any Expect 100 Continues requests work for you? Is it just some that fail? How large are they? What is the content type? What is doing the read and transfer to disk?
@silviobierman I have created #10956 in which you can post additional information. I have already started cleaning up the implementation and adding more tests, but I have not yet found a reproducer for your issue. Let's talk more in the other issue.
Thanks. For now I was forced to revert that specific server to a previous version that uses Jetty11 so I will have to try and reproduce the issue myself. I will report back in #10956.
When trying to setup a test environment I encountered further errors with Jetty12. Our backup process consists of pulling a large ZIP archive from the application via a GET request. A restore (which I wanted to do to create a test environment) is done by POSTing the same ZIP back into the application (either through a FORM post or a plain ZIP payload). With Jetyy12 this fails in most if not all cases with all kinds of errors from the ZIPInputStream that byte counts are off, block sizes are invalid etc. As before, reverting to Jetty11 makes the problems go away. Somehow this just did not come up since the move to Jetty12 so I was focussing on the 100-continue issue.
Since I think the problem is not purely about 100-continue I guess the issue created by Greg is not the right place to discuss this further.
I am sorry I do not have more details right now. But the general picture is that all kinds of POST requests (both FORM and non-FORM) fail with larger payloads. It does not happen in one particular place but all over our system. As said earlier, all application code has been in place for years with Jetty versions 6-11 without any such issues. We have been using http/2 since it was supported by Jetty. If I remember correctly that was with Jetty9 but I could be wrong.
I will investigate further. Previous tests revealed that reverting to http1.1 with Jetty12 worked correctly but that is not the way we want to go. Therefore we have reverted everything to Jetty11 for now.
I'm reopening this issue just to record that there is something we need to look at and it looks different to 100-continues.
@silviobierman can you confirm that it is jetty-12 h2 that is having these issues? or are you seeing them on jetty-12 HTTP/1.1 as well? When you say "large", can you give us an estimate 10MB? 1GB? Whose ZIPInputStream has the errors? Is this our GzipHandler that is doing the inflating or something else? Is the content a multipart form and then gzipped or a multipart form that has zip payload parts?
I've just discovered a bug in this codepath that may or may not explain this problem: #11022
@silviobierman if you could come up with a reproducer, this would be invaluable to quickly figure out what the problem is and fix it. Barring that, it would be helpful if you could reduce the scope of possibilities (answering @gregw's questions would help a lot) and if you could try a build with my fix for #11022.
Thanks!
The problem is with Jetty 12.0.3 ee10 HTTP/2.
The issue with the Expect:100-continue datafeeds is a plain POST with text/csv content. The other problem I described is with a JavaScript FormData originated FORM POST with a single part that contains ZIP content. No unzipping by Jetty, the application wraps with it's own ZipInputStream based on the content-type.
As is the exception thrown is: java.util.zip.ZipException: invalid stored block lengths
When disabling HTTP/2 and using HTTP 1.1: java.util.zip.ZipException: invalid entry size (expected 585999 but got 585994 bytes)
Using Jetty 11.0.15 both variants work without issue.
Bigger roughly means ranging between 30Mb and a couple of Gb. This is the FORM POST case. The external datafeed plain POSTs are all smaller then 50Mb.
I am downloading branch jetty-12.0.x now to do a build and try again. Soon to be continued.
I did the following:
$ git clone --branch jetty-12.0.x https://github.com/jetty/jetty.project $ cd jetty.project/ $ mvn -Pfast clean install
and retried with the resulting jetty-12.0.4-SNAPSHOT JARs. Unfortunately this gave the same results.
Because of my current workload I will not be able to isolate this into an isolated test case anytime soon. The application code is quite complex since the handler code that extracts the ZIP is quite complex handling multiple special cases inside the ZIP, some of which is recreating H2 databases from custom backup-formats (inside nested ZIPs) and traversing multiple other cases of nested ZIP elements. The extracting itself is large and linked to thousands of lines of application code (which, to make matters worse, is all in Scala).
But anything you want me to try/test I will try to fit in.
The issue with the Expect:100-continue datafeeds is a plain POST with text/csv content. The other problem I described is with a JavaScript FormData originated FORM POST with a single part that contains ZIP content. No unzipping by Jetty, the application wraps with it's own ZipInputStream based on the content-type.
As is the exception thrown is: java.util.zip.ZipException: invalid stored block lengths
When disabling HTTP/2 and using HTTP 1.1: java.util.zip.ZipException: invalid entry size (expected 585999 but got 585994 bytes)
Note that ZipInputStream and GzipInputStream do not read to EOF.
Once you are done with the Zip/Gzip document you'll have to continue reading from the raw stream to trigger the EOF. (especially important on HTTP/1.1 with persistent connections, doubly so if the Transfer-Encoding chunked is in play)
Jetty version(s) 12.0.1
Jetty Environment ee10 / http/2
Java version/vendor
(use: java -version)
OpenJDK Runtime Environment (Red_Hat-21.0.0.0.35-1.rolling.fc38) (build 21+35) OpenJDK 64-Bit Server VM (Red_Hat-21.0.0.0.35-1.rolling.fc38) (build 21+35, mixed mode, sharing)OS type/version Fedora 38 Workstation
Description This application sets up an embedded Jetty 12.0.1 instance with one servlet class TestServlet1. The doPost method simply tries to read-and-discard the content of the body before sending a short response. Posting a sufficiently large file to the application will result in a lockup. The curl client will block trying to send more data while the application blocks trying to read. The point where th size of the POST body is printed to stderr is then never reached.
A slight modification of the code makes it use TestServlet2 with a doPost that processes the input expecting a ZIP-formatted body by reading and discarding entries and printing the name:size of each entry to stderr. Posting a sufficiently large ZIP file to it shows that initial entries are processed corrrectly then it locks up the same way as with TestServlet1.
Small POST requests seem to work without issues. GET requests seem to work fine also and our application ported from Jetty 11 to Jetty 12 works mostly fine except for large POST requests.
How to reproduce? See the above description. I am sorry that this is Scala code instead of Java. I currently have no Java development environment setup.
Compile/runtime dependencies specified in build.sbt, some of which redundant for the test code
Scala code