vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.56k stars 1.54k forks source link

Vector sink fails when sending payloads bigger than 4MB #17926

Closed trennepohl closed 1 year ago

trennepohl commented 1 year ago

A note for the community

Problem

Vector sink fails to push messages bigger than 4MB.

2023-07-10T11:54:59.051789Z ERROR sink{component_kind="sink" component_id=out_aggregator component_type=vector component_name=out_aggregator}:request{request_id=5159}: vector::sinks::util::retries: Non-retriable er
ror; dropping the request. error=Request failed: status: OutOfRange, message: "Error, message length too large: found 5000077 bytes, the limit is: 4194304 bytes", details: [], metadata: MetadataMap { headers: {"con
tent-type": "application/grpc", "server": "envoy", "x-envoy-upstream-service-time": "124", "content-length": "0", "date": "Mon, 10 Jul 2023 11:54:59 GMT"} } internal_log_rate_limit=true

Configuration

[sinks.out_aggregator]
  type = "vector"
  inputs = [ "add_gcp_info" ]
  address = "${AGGREGATOR_ADDRESS}"
  version = "2"
  batch.max_bytes = 5000000
  request.concurrency = "adaptive"
  request.timeout_secs = 30
  request.retry_max_duration_secs = 10
  buffer.type = "memory"
  buffer.when_full = "block"
  buffer.max_events = 30000

Version

0.31.0

Additional Context

After upgrading Vector to version 0.31.0 we started seeing this error.

After a bit of searching on the internet I bumped into this article. https://cprimozic.net/notes/posts/rust-tonic-request-response-size-limits/

Looks like from tonic 0.9 default max size is 4mb.

References

jszwedko commented 1 year ago

Thanks for opening this @trennepohl ! It seems like we need to:

Alternatively to preserve backwards compatibility we could remove the limits again and re-add them later. What version were you upgrading from where it worked?

trennepohl commented 1 year ago

Thanks for opening this @trennepohl !

No worries šŸ‘

Set the max encoding size in the Vector sink to whatever the maximum configured batch size is

Yeah I tried to do that using max_decoding_message_size on the proto_client in Vector's sink, but didn't work šŸ¤· Not super familiar with Rust šŸ˜… .

What version were you upgrading from where it worked?

0.29

jszwedko commented 1 year ago

Thanks for the additional context @trennepohl ! I think we can probably back out the limits for now and then re-add them later; tying them to the batch sizes.

slgero commented 1 year ago

@jszwedko Hi, I've updated vector to 0.32.0 and still get this error:

Some(Request { source: Status { code: OutOfRange, message: "Error, message length too large: found 4612000 bytes, the limit is: 4194304 bytes", metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Fri, 18 Aug 2023 09:15:09 GMT", "content-length": "0"} }, source: None } })

My configuration:

[sinks.center_vector]
    address = "${AGGREGATOR_ADDRESS}"
    compression = true
    healthcheck = false
    inputs = ["log_files", "vector_logs"]
    type = "vector"
[sinks.center_vector.batch]
    max_events = 5000
    timeout_secs = 1
[sinks.center_vector.buffer]
    max_events = 6000
    type = "memory"
    when_full = "block"

Any thoughts about this?

jszwedko commented 1 year ago

Huh, interesting. I admittedly hadn't actually tested this šŸ˜“ I just followed the recommendations. Let me try it with your config. I'll reopen this to track.

jszwedko commented 1 year ago

Hi @slgero ,

I'm having trouble reproducing this with 0.32.0 though I can with 0.31.0. Are you sure you are running v0.32.0 as the receiver (that is, the Vector instance with the vector source)?

As I was setting up the reproduction I remembered I did actually test this change locally.

neuronull commented 1 year ago

I was also able to repro this with 0.31.0 but not on 0.32.0. In the A/B cases, both sender and receiver were on the same version.

slgero commented 1 year ago

Hi, @jszwedko, As soon as I updated the version on the centre vector too, the error disappeared. I've only updated the version at the agent before. So you can close the issue. Thank you!

jszwedko commented 1 year ago

Great, thanks for confirming @slgero !