vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
18.22k stars 1.6k forks source link

Unnecessary escaping in influx line protocol #13162

Open Fiery-Fenix opened 2 years ago

Fiery-Fenix commented 2 years ago

A note for the community

Problem

sinks.influx is doing unnecessary escaping which breaks further pipeline (like telegraf) According to official specification of influx protocol - it's not needed to escape "." (dot) in any parts of influx line protocol. But vector is escaping it those making produced event incompatible with other solutions that's supports Influx line protocol. Example (see provided configuration for pipeline): echo '{"event.name":"event"}' | vector --config influx.toml Expected result: vector,metric_type=logs event.name="event" 1655286885923535800 Actual result: vector,metric_type=logs event\\\\.name="event" 1655286885923535800

Configuration

[sources.stdin]
type = "stdin"

[transforms.parse]
type = "remap"
inputs = [ "stdin" ]
source = """
. = parse_json!(.message)"""

[sinks.influx]
type = "influxdb_logs"
inputs = ["parse"]
consistency = "any"
database = "default"
endpoint = "http://localhost:8080/"
retention_policy_name = "autogen"
measurement = "vector"

Version

vector 0.22.1 (x86_64-unknown-linux-gnu b633e95 2022-06-10)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

jszwedko commented 2 years ago

Thanks for the reproduction @Fiery-Fenix ! That is odd that we are escaping .s there.

shymega commented 2 years ago

I'm looking to take this one on. I'm still investigating, but I am also pursuing the line of inquiry into this being an issue with an external crate(s).

shymega commented 2 years ago

I haven't had time to fix this yet, focusing on job hunt, but my approach to debugging would be a mixture of gdb, packet sniffing, and dbg!(). I'm quite sure this is an issue with either serde or the HTTP client. I have not yet spotted any parsing issues in Vector, in which case, some low-level debugging would be required.

juvenn commented 1 year ago

With some naive debugging, I was able to pinpoint the culprit. A \ is introduced when trying to iterate the LogEvent, then at influxdb encoding time, an extra \ is introduced to escape it.

https://github.com/vectordotdev/vector/blob/00c0316b0ca47ec77720912569f69708221f4bfe/lib/vector-core/src/event/util/log/all_fields.rs#L91-L95

https://github.com/vectordotdev/vector/blob/00c0316b0ca47ec77720912569f69708221f4bfe/src/sinks/influxdb/mod.rs#L293-L300

As far as I can tell, influx is encoding correctly to escape \ and ", and it has tests to ensure that

https://github.com/vectordotdev/vector/blob/00c0316b0ca47ec77720912569f69708221f4bfe/src/sinks/influxdb/logs.rs#L645-L654

Though I feel I don't have enough knowledge to figure the LogEvent iteration part, yet.

jszwedko commented 1 year ago

It seems like this is yet another case of using String for paths, making it hard to understand how to interpret that value. In this case, it looks like all_fields is the root cause here, which recursively iterates an event to return a list of (path, value). all_fields has been a problem for multiple different parts of Vector and really just needs to be re-written to return a real path type so each user of that function can handle it appropriately. For influx specifically it seems like we want to encode the path to a string differently than all_fields is doing by default.

(from a Slack message)

This will be a bit of a big change unfortunately.