vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.11k stars 1.6k forks source link

0.17.0 kafka source bahaviour different with xml format #9556

Closed joker9357 closed 3 years ago

joker9357 commented 3 years ago
vector --version
vector 0.17.0 (x86_64-apple-darwin 3d34cde 2021-10-08)

Vector Configuration File

[sources.kafka]
  type = "kafka" # required
  bootstrap_servers = "" # some value
  group_id = "omni-logs2metrics" # required. Can be whatever you want
  key_field = "message_key" # optional, default
  tls.enabled = true
  tls.ca_file = "" #pem file
  tls.crt_file = "" #pem file
  tls.key_file = "" #pem file
  tls.key_pass = "" #password
  tls.verify_certificate = true
  topics = [""] # topic
  decoding.codec = "bytes" #try all 3 values

Debug Output

{"headers":{"X_INSTANA_C":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000w0�*nZ�am����.�Q","X_INSTANA_L":"\u0001"},"message":"        </Container>","message_key":null,"offset":9123121,"partition":4,"source_type":"kafka","timestamp":"2021-10-11T10:38:15.613Z"}
{"headers":{"X_INSTANA_C":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000w0�*nZ�am����.�Q","X_INSTANA_L":"\u0001"},"message":"    </Containers>","message_key":null,"offset":9123121,"partition":4,"source_type":"kafka","timestamp":"2021-10-11T10:38:15.613Z"}
{"headers":{"X_INSTANA_C":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000w0�*nZ�am����.�Q","X_INSTANA_L":"\u0001"},"message":"    <ExchangeOrders/>","message_key":null,"offset":9123121,"partition":4,"source_type":"kafka","timestamp":"2021-10-11T10:38:15.613Z"}
{"headers":{"X_INSTANA_C":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000w0�*nZ�am����.�Q","X_INSTANA_L":"\u0001"},"message":"    <ReturnOrdersForExchange/>","message_key":null,"offset":9123121,"partition":4,"source_type":"kafka","timestamp":"2021-10-11T10:38:15.613Z"}
{"headers":{"X_INSTANA_C":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000w0�*nZ�am����.�Q","X_INSTANA_L":"\u0001"},"message":"</Order>","message_key":null,"offset":9123121,"partition":4,"source_type":"kafka","timestamp":"2021-10-11T10:38:15.613Z"}

Expected Behavior

{"headers":{"X_INSTANA_C":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000!�:\u0015eڥ-�� l��'�","X_INSTANA_L":"\u0001"},"message":"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?><Order ActualPricingDate=\"2021-10-11T10:31:56+00:00\" AllAddressesVerified=\"N\"              
             ...                   
<AdditionalAttribute AttributeGroupID=\"all\" Name=\"original-price\" Value=\"669.00\"/>\n                </AdditionalAttributeList>\n            </ItemDetails>\n        </OrderLine>\n    </OrderLines>\n    <Shipments/>\n    <ReturnOrders/>\n    <Containers/>\n    <ExchangeOrders/>\n    <ReturnOrdersForExchange/>\n</Order>","message_key":null,"offset":9121972,"partition":15,"source_type":"kafka","timestamp":"2021-10-11T10:31:57.112Z"}

Actual Behavior

Example Data

Additional Context

References

0.16.1 everything is OK, but after 0.17.0 it begin to read xml file line by line

jerome-kleinen-kbc-be commented 3 years ago

This is most likely related to new framing feature, see https://vector.dev/highlights/2021-10-06-source-codecs/ Even tho the docs aren't really clear on this, the default value for the framing.method seems to be character_delimited, and my guess is that the default delimiter is \n. I would tinkle with these settings if I was you.

jerome-kleinen-kbc-be commented 3 years ago

Judging by the code https://github.com/vectordotdev/vector/blob/master/src/codecs/mod.rs#L160 the default value seems to be newline_delimited, which underlying is actually character_delimited with \n as delimiter. The confusing part is that this is actually logged as characted_delimited when the max_length is being hit.

joker9357 commented 3 years ago

@jeromekleinen-kbc And @pablosichert Thanks for you reply, and try to use framing.method = "bytes" in source block But the message still separate in 4 parts at random character. Maybe because my message is too long so it is sent by several UDP packages, how to fetch the message as a whole package?

joker9357 commented 3 years ago

Ok find the ticket #9564 thanks

jszwedko commented 3 years ago

Closed as part of https://github.com/vectordotdev/vector/issues/9564

Thanks again for reporting this @joker9357 .