redpanda-data / connect

Fancy stream processing made operationally mundane
https://docs.redpanda.com/redpanda-connect/about/
7.98k stars 786 forks source link

Corrupt messages on streaming http_client disconnect #908

Open nicktelford opened 2 years ago

nicktelford commented 2 years ago

When in streaming mode, a if an http_client input is disconnected prematurely, it's possible that a prematurely truncated message will be processed.

Since in HTTP streams messages are either explicitly delimited by MIME multi-part headers, or by new-lines, it should be possible to detect when a message has been prematurely closed and prevent it from being processed in the downstream pipeline.

Jeffail commented 2 years ago

Will need a big of digging but this is probably going to need to be an option on the codec type, where on EOF we discard partial data rather than flush it.

garbelini commented 1 year ago

We are seeing this again. This time we compared this benthos configuration with curl and curl keeps up while benthos http client stream bails with ERRO Failed to read message: unexpected EOF

Of all the streams we have targeting the same system with the same configuration, the one that is failing is the one with the the lowest volume.

input:
  label: ""
  http_client:
        url: https://hostname
        verb: GET
        headers:
          Content-Type: application/json
        stream:
          enabled: true
        basic_auth:
            enabled: true
            username: ********
            password: ********