ZnResponse reads response differently depending on the ‘Content-Type’ when ‘Content-Length’ is greater than the body’s size

Rinzwind commented 1 year ago

The behavior of ZnResponse is different for the two ‘Content-Type’ values in the following example. There’s a mismatch between the ‘Content-Length’, which is 3, and the actual response body, which has only 2 bytes. With ‘text/plain’ as the ‘Content-Type’, the contents of the ZnResponse reflects the actual size of the body, while with ‘application/octet-stream’, the contents is the response body padded with a zero byte:

#('text/plain' 'application/octet-stream') collect: [ :contentType |
    | string response |
    string := String crlf join: {
        'HTTP/1.1 200 OK'.
        'Content-Type: ' , contentType.
        'Content-Length: 3'.
        ''.
        'ab' }.
    response := ZnResponse readFrom: string asByteArray readStream.
    response contents ]
"=> #('ab' #[97 98 0])"

It would seem better for the contents to not have the padding in the second case. Perhaps a resumable exception should be signaled in both cases.

I noticed this while investigating a bug in which there was such a ‘Content-Length’ mismatch, I don’t think this a very important issue but it seemed worth noting at least. I tested the example in Pharo 12 build 616 with the included version of Zinc. In GemStone/S, an error is signaled in the ‘application/octet-stream’ case within the context of PositionableStream>>#next:into:, as it tries to read the exact number of elements specified while the method in Pharo reads at most the number specified.

svenvc commented 1 year ago

Thanks for the feedback.

An incorrect content-length (too large or too small) will inevitably result in trouble (i.e. things will go horribly wrong).

In general, there is no way to notice this situation. Your example is a bit of an edge case, since your input stream ends. In this case, you could say 'read as much as you can', but it would still be wrong (one byte is missing). In most modern HTTP conversations, there are multiple request/response messages on the same stream. A correct content-length is then absolutely required, as it is used to know when to stop reading.

I do agree that it would probably be better for both cases (ZnStringEntity vs ZnByteArrayEntity) to behave the same. Having looked at the #readFrom: methods of these classes and the code in ZnUtils I am a bit surprised by how messy it is. I think it would be possible to improve the situation a bit and align their behaviour.

I am not able to fix this right now, maybe next week.

svenvc commented 1 year ago

The following test code is more correct, in the case of non-ASCII input the byte and character counts are different:

#('text/plain' 'application/octet-stream') collect: [ :contentType |
    | string response |
    string := String crlf join: {
        'HTTP/1.1 200 OK'.
        'Content-Type: ' , contentType.
        'Content-Length: 4'.
        ''.
        '€' }.
    response := ZnResponse readFrom: string utf8Encoded readStream.
    response ]

svenvc commented 1 year ago

Here is what I did: https://github.com/svenvc/zinc/commit/f8c23a64f0a7ccd23716dab3bacb59060c1a69f7

Rinzwind commented 1 year ago

Looks good! The result of my example is now #('ab' #[97 98]) as expected. This issue can be closed then, unless you still want to add the exception to make the ‘Content-Length’ mismatch clear. In the first comment, I had a resumable exception in mind so that it can be ignored if necessary, alternatively a non-resumable exception with an option in ZnOptions to not signal it could be used.

svenvc / zinc

ZnResponse reads response differently depending on the ‘Content-Type’ when ‘Content-Length’ is greater than the body’s size #118