[TCK Challenge] attempting to writing more than content-length bytes

jakartaee / servlet

Jakarta Servlet

https://eclipse.org/ee4j/servlet

Other

253 stars 81 forks source link

[TCK Challenge] attempting to writing more than content-length bytes #525

Open janbartel opened 1 year ago

janbartel commented 1 year ago

Challenged tests:

com/sun/ts/tests/servlet/spec/httpservletresponse/URLClient.java.flushBufferTest

TCK Version:

servlet-tck-6.0.1

Tested Implementation:

Jetty

Description:

Section 5.7 Closure of Response Object states that:

When a response is closed, the container must immediately flush all remaining content in the response
buffer to the client. The following events indicate that the servlet has satisfied the request and that the
response object is to be closed:
• The termination of the service method of the servlet.
• The amount of content specified in the setContentLength or setContentLengthLong method of the
response has been greater than zero and has been written to the response.
• The sendError method is called.
• The sendRedirect method is called.
• The complete method on AsyncContext is called.

The flushBufferTest does:

  public void flushBufferTest(HttpServletRequest request,
      HttpServletResponse response) throws ServletException, IOException {
    int size = 40;
    response.setContentLength(size);

    PrintWriter pw = response.getWriter();
    pw.println("Test PASSED");
    StringBuffer tmp = new StringBuffer(2 * size);
    int i = 0;

    while (i < 8) {
      tmp = tmp.append("111111111x");
      i = i + 1;
    }
    pw.println(tmp);
    response.addIntHeader("header1", 12345);
  }

The test expects a 200 response without the presence of header1 and probably that the output has been truncated to 40 bytes.

In this case, where the servlet has attempted to write more than the content-length number of bytes Jetty will throw an IOException and return an error 500 response.

There is nothing in the wording of the spec nor javadoc that prohibits a 500 response, or that specifically requires a 200 response with excess bytes ignored. Nor would it make sense to truncate the bytes written to content-length, as some content encodings are mutiibyte and would result in garbage.

If the aim of the test is to ensure that excess characters are not written, nor the header sent, then the test should accept an error 500 response.

gregw commented 1 year ago

@janbartel actually we don't throw an IOException but a private BadMessageException extends RuntimeException that causes the 500.

However, even if we didn't throw (and just set the writer's checkError status), it doesn't change the fact that there is no good way to implement pw.println(tmp) that would satisfy the test. If we truncate the write, then do we also set checkError to return an exception to indicate that not all the characters were written? More importantly, as @janbartel said, what if the truncation is in the middle of a multi-byte character?

I think the test should be rewritten as:


    public void flushBufferTest(HttpServletRequest request,
                                HttpServletResponse response) throws ServletException, IOException
    {
        int size = 40;
        response.setContentLength(size);
        response.setCharacterEncoding(StandardCharsets.ISO_8859_1.name());
        PrintWriter pw = response.getWriter();
        String testPassed = "Test PASSED";
        pw.println(testPassed);
        pw.println("x".repeat(size - testPassed.length()));
        response.addIntHeader("header1", 12345);
    }

perhaps the test should also attempt a write to ensure the response is closed, but due to the nature of these tests, that would require state to be remembered and a second request made to check the results.

janbartel commented 8 months ago

@markt-asf any comments on the challenge to the flushBufferTest?

markt-asf commented 8 months ago

TL;DR: the TCK test is correct and consistent with the specification but I think the specification (and hence the test) need to change to better handle the case where more than content-length data is written to the response.

My reading of the specification as currently written is that as soon as content-length bytes have been written (which includes the case if more than content-length bytes are written) then the conditions for closure have been met hence the response should be closed which will result in a 200 response to the client without header1.

Writing more than content-length bytes is an error condition and I think it should be treated as such. I'm leaning towards a solution that:

adds phrasing to the closure conditions (wording TBD) that requires exactly content-length bytes must be written;
adds wording that if content-length has been specified, writing more than content-length bytes to the response will trigger a 500 error (where to put this is TBD);
updates the TCK accordingly.

gregw commented 3 days ago

@markt-asf Are you sure there is verbage in the current spec saying that the container should truncate a too-large write?

Consider the following code:

     String pileOfPoo = "💩💩💩";
     String json = "{value=\"" + pileOfPoo + "\"}";
     response.setContentType("application/json");
     response.setContentLength(json.length());
     response.getOutputStream().print(json);

OK this is a programming error, because the string length is not the same as the byte length of a string. But such mistakes are common and often undetected by testing as multi-byte characters are not always tested. Such a bug might only be triggered when a user supplied string with multi-byte characters is received.

I seams wrong and dangerous for a container to be able to detect this obviously wrong write, yet to simply truncate the data in the middle of a multi-part UTF-8 character and send the response as if there was no problem at all. Then what should the container do with the excess bytes of the write? should it throw an exception after committing a valid response? To write a valid response and then throw gives us a Schrodinger's cat write that is both successful and a failure and only be observing the actual response sent can we know which.

I'm very reluctant to change the Jetty code to send what is known to be an invalid response message just to pass this TCK test.

The specification say:

When a response is closed, the container must immediately flush all remaining content in the response buffer to the client. The following events indicate that the servlet has satisfied the request and that the response object is to be closed:
 - The amount of content specified in the setContentLength or setContentLengthLong method of the response has been greater than zero and has been written to the response.

It does not say "the amount of content specified OR GREATER". At no time in the above example must there be a moment when exactly the "amount of content specified in the setContentLength" has been written, unless in the naive implementation of writing byte by byte. All reasonable implementations will write a single ByteBuffer or byte[] that is too large and the bytes written will go from 0 to greater than the length specified! So I do not see that the TCK test can be said to meet this criteria.

We'd like to proceed with this challenge to the TCK.

markt-asf commented 1 day ago

This all comes down to the interpretation of

The amount of content specified in thesetContentLength orsetContentLengthLong method of the response has been greater than zero and has been written to the response

The current wording is not explicit about what happens if an attempt is made to write more than the specified amount.

On the basis that the TCK test has been written based on a particular interpretation and that the TCK test has been the same since at least the Servlet 2.4 TCK which is over 20 years old I lean towards the view that the TCK is testing the intended interpretation. Hence my view that the challenge is invalid.

All that said, I agree that the current behaviour is less than ideal. Writing more than the specified amount should be treated as an error condition and I am all in favour of updating the Servlet specification and TCK to require this behaviour from the next version. We'll need to handle this happening both before and after the response has been committed but that is true for any situation leading to an internal error so I don't think there is anything new there.

I don't think the challenge is valid but if other folks want to view this as valid and go through the process to release an updated TCK I'm not going to object but neither do I have the time to assist.

stuartwdouglas commented 1 day ago

Just thinking about how this could be handled, 500 makes sense before the response is committed, however I am not sure what the best approach is if an attempt is made to write too much after the response is committed. My gut feeling is that we should just close the connection / RST_STREAM so the client side is clear that something has gone wrong and this is not a valid response.

There are also technically 3 different failure modes here:

Pre commit, which we can send a 500 for
Post commit, attempting to write a byte array with size >1, which we could just kill the connection / stream for
Post commit, writing one byte at a time. In this case we have no way of telling the client that there is a problem, as at the time the response is sent we don't know that the user will attempt to write more. All we can do in this case is a server side exception.

gregw commented 1 day ago

@stuartwdouglas I think there is a reasonable interpretation of the spec in all cases:

If uncommitted and a write/print is received with: a) too few bytes, then potentially the response is committed and the bytes flushed or they bytes are just buffered. Either way, the stream is left open for future writes b) exactly the right number of bytes, then the response is committed and completed. The stream is closed and future writes will fail with an exception. c) too many bytes, then an IllegalArgumentException (or similar) is thrown. The application can catch and handle that, or let if fall through to the container, in which case the response will be reset and a 500 response written.
If committed and a write/print is received with: d) too few bytes, then the bytes are flushed or just buffered. Either way, the stream is left open for future writes e) exactly the right number of bytes, then the response is completed. The stream is closed and future writes will fail with an exception. f) too many bytes, then an IllegalArgumentException (or similar) is thrown. The application can catch and handle that, or let if fall through to the container, in which case the container will make a best effort attempt to signal failure depending on the protocol. With h2 and h3 a stream reset with an error code can be sent. With h1, the connection can be closed without terminating the response body correctly. Either way, the client will have some indication of a problem on the server.

Of these scenarios, I think only c) and f) are controversial. I think @markt-asf is suggesting the following interpretation:

If uncommitted and a write/print is received with: c) too many bytes, then the bytes are trimmed to the exact number required, the response is committed and completed and the write returns normally (or maybe throws??).
If committed and a write/print is received with: f) too many bytes, then the bytes are trimmed to the exact number required, the response is completed and the write returns normally (or maybe throws??).

I really can't see these as valid/safe interpretations.

Of these scenarios, the current TCK appears to only be testing c), but expecting the behaviour of b). @janbartel's proposal above is to change the test to check behaviour b), which I don't think is controversial (although the proposed code has a bug because it uses println rather than print).

I think it would be a good short term compromise to change the TCK test to check scenarios a) b) d) and e), which are the non controversial key scenarios. We could defer testing c) and f) until we have agreed behaviours. We will prepare a PR on the TCK.

gregw commented 1 day ago

I have created PR #680 to change the test to just check the non controversial behaviour of a good servlet.