Closed rrichardson closed 3 years ago
Actually. I think I know the cause of this:
The pods in question that have this issue are those that have 2 containers running in a single pod. They have the same pod UID, but they're different filenames. If I understand correctly, the "stream" in loki is differentiated by their unique combination of labels (more or less) So the two containers might produce inconsistent timestamps if Vector treats those two "streams" as the same.
The only immediate workaround I can see is to use the file
field which is unique for each container. This can also substitute for the pod_uid
as a disambiguator, since the filename also includes the pod_uid.
Same here. Adding the label "file" doesn't fix the "out ouf order" problem.
Here is the config, note that I am using helm chart to deploy vector, so I have to escape the "{}" characters.
data_dir = "/vector-data-dir"
# Ingest logs from Kubernetes.
[sources.kubernetes_logs]
type = "kubernetes_logs"
[transforms.stripped_logs]
type = "ansi_stripper"
inputs = ["kubernetes_logs"]
field = "message"
[sinks.loki]
type = "loki"
inputs = ["stripped_logs"]
endpoint = "http://loki:3100"
healthcheck = true
# request.retry_attempts = 3
# Labels
[sinks.loki.labels]
k8s_namespace = {{ printf "\"{{ kubernetes.pod_namespace }}\"" }}
k8s_file = {{ printf "\"{{ file }}\"" }}
We have a filebeat->kafka->vector->loki config that gather k8s pod log. For some reason, inserting a remove_fields
"timestamp" filter before loki sink eliminated out-of-order reports in our case.
We leave loki sink's remove_timestamp
at default true
. Could it be that something else makes a timestamp
field that is not well-ordered in input stream?
The flow is:
[sources.k8s-log]
type = "kafka"
# ... other kafka settings
[transforms.json_in_message]
type = "json_parser"
inputs = ["k8s-log"]
drop_invalid = true
[transforms.filter_errors]
type = "filter"
inputs = ["json_in_message"]
condition."message.contains" = ["... interesting errors here...."]
[transforms.remove_timestamp]
type = "remove_fields"
inputs = ["filter_errors"]
fields = ["timestamp"]
[sinks.dev_loki]
type = "loki"
inputs = ["remove_timestamp"]
endpoint ="....3100..."
labels.exported_namespace = "{{ kubernetes.namespace }}"
labels.exported_app = "{{ kubernetes.labels.app }}"
labels.exported_pod = "{{ kubernetes.pod.name }}"
labels.exported_node = "{{ kubernetes.node.name }}"
labels.exported_container = "{{ kubernetes.container.name }}"
labels.vector_instance = "inf-${HOSTNAME}" # unique label per instance to separate loki streams
encoding.codec = "text"
encoding.only_fields = ["message"]
I think that if there is no timestamp present, then Loki doesn't execute ordering validation, so it won't complain. That doesn't mean that the messages are in order.
On Wed, Oct 7, 2020 at 1:50 AM Hoàng Đức Hiếu notifications@github.com wrote:
We have a filebeat->kafka->vector->loki config that gather k8s pod log. For some reason, inserting a remove_fields "timestamp" filter before loki sink eliminated out-of-order reports in our case.
We leave loki sink's remove_timestamp at default true. Could it be that something else makes a timestamp field that is not well-ordered in input stream?
The flow is:
[sources.k8s-log]type = "kafka"# ... other kafka settings
[transforms.json_in_message]type = "json_parser"inputs = ["k8s-log"] drop_invalid = true
[transforms.filter_errors]type = "filter"inputs = ["json_in_message"]
condition."message.contains" = ["... interesting errors here...."]
[transforms.remove_timestamp]type = "remove_fields"inputs = ["filter_errors"]fields = ["timestamp"]
[sinks.dev_loki]type = "loki"inputs = ["remove_timestamp"]endpoint ="....3100..." labels.exported_namespace = "{{ kubernetes.namespace }}" labels.exported_app = "{{ kubernetes.labels.app }}" labels.exported_pod = "{{ kubernetes.pod.name }}" labels.exported_node = "{{ kubernetes.node.name }}" labels.exported_container = "{{ kubernetes.container.name }}" labels.vector_instance = "inf-${HOSTNAME}" # unique label per instance to separate loki streams
encoding.codec = "text" encoding.only_fields = ["message"]
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/timberio/vector/issues/3057#issuecomment-704791277, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA6KCZZQU3JXJHPUQHV4CTSJQTUJANCNFSM4OYWRBEQ .
It's more likely that with timestamps removed, the current time is used instead of whatever time the original event had - and that ends up always in the proper order. This looks very similar to the event ordering requirements problem that CloudWatch has. We might want to look into it in conjunction with the CloudWatch, as the fix/workaround might similar too.
@juchiast Just pinging on status for this one? Thanks.
I've run into this too. We've got Vector collecting Ubuntu Server logs from systemd-journald and I'm running into cases where I can prove with pcaps that Vector has sent events to Loki out of order. It's definitely not an issue of incorrect labels or or multiple instances using the same labels.
I'm pretty sure what's caused it is multiple retries happening out of order. I've not worked out the root cause of retries being required, but that's not relevant to this order issue.
The default request.in_flight_limit
setting is 5. So there may be 5 requests that fail and get retried. However if these succeed out of order, Loki will receive events out of order.
To work around this issue, I've tried setting the limit to back 1
.
request.in_flight_limit = 1
I need to go back through my logs to see if that's solved it, but I think it will. I guess this is a reasonable solution and should probably be the default, unless much more complicated retry logic will be added. Just need to consider the buffer/batches limits too as less requests in-flight may lead to more events being buffered.
Hi @jamtur01, I'll get on this when I'm done with #5138.
After looking at this, I think this could been cause by:
in_flight_limit > 1
, there are many ways to cause out-of-order error: a batch failed and was retried after a newer batch succeeded, many batch failed and were retried in a different order, network latency causing HTTP requests to be received in a different order.Setting in_flight_limit
to 1 can fix the second problem, but I think this has a serious performance issue.
I think we can add a configurable option where Vector will reassign timestamp of out-of-order message (with the current time, or the most recent timestamp, in case some message has timestamp from the future).
Does loki really require that all the batches are ordered? If so, it doesn't seem to be possible to write concurrently to it at all - only in sequential batches. Otherwise, we can't (or it's very hard to) guarantee there are no races. It would make sense if they only request batches to be ordered. Or if there was some small time margin during which the data is allowed to come slightly out of order (i.e. I hope that within the last 10 minutes since the last registered event at loki it'd be fine to send events that are 5 minutes past that time).
I think we can add a configurable option where Vector will reassign timestamp of out-of-order message (with the current time, or the most recent timestamp, in case some message has timestamp from the future).
This would be really bad for some cases. Although, might be fine as an opt-in.
@MOZGIII It is explained in section 6 "Logs must be in increasing time order per stream" here https://github.com/grafana/loki/blob/master/docs/sources/best-practices/current-best-practices.md .
The requirement is per-stream so we could still write streams in parallel.
They mentioned something which I don't really understand,
It's also worth noting that the batching nature of the Loki push API can lead to some instances of out of order errors being received which are really false positives. (Perhaps a batch partially succeeded and was present; or anything that previously succeeded would return an out of order entry; or anything new would be accepted.)
Just curious if we can find useful options in Loki config - there is a big configuration and something can be tuned.
@afoninsky I've looked around issues and there is a suggestion to have Loki set the timestamp at ingestion time, but I don't see any further progress on it yet. https://github.com/grafana/loki/issues/1544 .
Not sure how it works, but after adding this transformation - I do not see errors in logs anymore:
[transforms.fix3057]
type = "coercer"
inputs = ["kubernetes_logs"]
# avoid https://github.com/timberio/vector/issues/3057
types.timestamp = "timestamp|%F"
[sinks.kourier]
type = "loki"
inputs = ["fix3057"]
endpoint = "{{ include "logs.url" . }}"
encoding.codec = "text"
labels.environment = "{{ .Values.labels.environment }}"
labels.stage = "{{ .Values.labels.stage }}"
labels.tier = "{{ .Values.labels.tier }}"
request.in_flight_limit = 1
(without coercer transform with "request.in_flight_limit" only - errors exist)
@afoninsky I checked and timestamp = "timestamp|%F"
produced an silent error that was printed with debug!
(#5558). I was using stdin source to test, and it could have worked correctly with kubernetes_logs
, I could not checked it.
I think what happened is: timestamp field got removed because of coercer error, then loki sink assigned new timestamp with the current time, so messages are always in a proper order.
You could still hit this error when in_flight_limit = 1
,
Vector collects logs from many sources in parallel, therefore logs could arrive at loki sink out-of-order. Although Vector sorts its buffer before sending, some out-of-order logs could arrive after a batch was full and sent, lead to a small loss of logs.
I think it would be 1-2 dropped entries if it happened. Can you recall how much log was loss when using "request.in_flight_limit" only?
I've compared amount of events delivered to Loggly and Loki for the same interval (1h, 3h, 6h). I'd say that it is very close (660 857 vs 660 254 events for 1h interval). I do not say that it fixes the bug, just FYI.
Example of queries I used:
Loki - sum(count_over_time({stage="staging",k8s_container="controller"}[6h]))
Loggly (last 6h) - tag:staging json.kubernetes.container_name:"controller"
Config:
data_dir = "/vector-data-dir"
[log_schema]
host_key = "host"
message_key = "message"
source_type_key = "source_type"
timestamp_key = "timestamp"
[sources.kubernetes_logs]
type = "kubernetes_logs"
[transforms.fix3057]
type = "coercer"
inputs = ["kubernetes_logs"]
types.timestamp = "timestamp|%F"
[sinks.loki]
type = "loki"
inputs = ["fix3057"]
endpoint = "..."
encoding.codec = "text"
labels.stage = "staging"
labels.k8s_container = "{{ kubernetes.container_name }}"
request.in_flight_limit = 1
[sinks.loggly]
type = "http"
inputs = ["kubernetes_logs"]
uri = "https://logs-01.loggly.com/bulk/${LOGGLY_TOKEN}/tag/staging/"
encoding.codec = "ndjson"
buffer.type = "disk"
buffer.max_size = 104900000
request.in_flight_limit = 1
Update: checked without "transforms.fix3057" with "request.in_flight_limit = 1", didn't notice significant changes -> this fix doesn't work, as expected. Thanks for the clarification.
I am still seeing this error in vector-agent-0.12.0-nightly-2021-02-11, is that expected?
loki log:
level=warn ts=2021-02-12T03:34:36.20880116Z caller=grpc_logging.go:38 method=/logproto.Pusher/Push duration=189.931µs err="rpc error: code = Code(400) desc = entry with timestamp 2021-02-12 03:34:34.795556435 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 03:34:34.8338347 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\ntotal ignored: 2 out of 4" msg="gRPC\n"
vector log:
Feb 12 03:34:42.733 ERROR sink{component_kind="sink" component_name=loki component_type=loki}:request{request_id=215}: vector::sinks::util::sink: Response wasn't successful. response=Response { status: 400, version: HTTP/1.1, headers: {"content-type": "text/plain; charset=utf-8", "x-content-type-options": "nosniff", "date": "Fri, 12 Feb 2021 03:34:42 GMT", "content-length": "1048"}, body: b"entry with timestamp 2021-02-12 03:34:41.601975657 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 03:34:41.611767734 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 03:34:41.632458583 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 03:34:41.673088885 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 03:34:41.754009042 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 03:34:41.914371272 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 03:34:42.234826912 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\ntotal ignored: 7 out of 7\n" }
vector config (helm chart):
# The Vector Kubernetes integration automatically defines a
# kubernetes_logs source that is made available to you.
# You do not need to define a log source.
# Extra env vars to pass to the `vector` container.
env: []
# Global parts of the generated `vector` config.
globalOptions:
# Specifies the (in-container) data dir used by `vector`.
dataDir: "/vector-data-dir"
# Schema part of the generated `vector` config.
logSchema:
hostKey: "host"
messageKey: "message"
sourceTypeKey: "source_type"
timestampKey: "timestamp"
# The "built-in" kubernetes logs source. Will be added by default, unless
# explicitly disabled.
kubernetesLogsSource:
# Disable to omit the kubernetes logs source from being added.
enabled: true
# The name to use for the "built-in" kubernetes logs source.
sourceId: kubernetes_logs
# Raw TOML config to embed at the kubernetes logs source.
rawConfig: null
# Sources to add to the generated `vector` config (besides "built-in" kubernetes
# logs source).
sources:{}
# Transforms to add to the generated `vector` config.
transforms: {}
# # https://github.com/timberio/vector/issues/3057
# remove_timestamps:
# type: "remove_fields"
# inputs: ["kubernetes_logs"]
# rawConfig: |
# fields = ["timestamp"]
# Sinks to add to the generated `vector` config.
sinks:
loki:
# https://vector.dev/docs/reference/sinks/loki/
type: loki
inputs: ["kubernetes_logs"]
rawConfig: |
endpoint = "http://loki.monitoring:3100"
remove_timestamp = true
encoding.codec = "json"
labels.forwarder = "vector"
labels.event = "{{ event_field }}"
labels.key = "value"
request.in_flight_limit = 1
In fact, no logs are appearing in loki at all.
Hi @DanTulovsky, can you check what is the latest log and its timestamp for the stream {forwarder=\"vector\", key=\"value\"}
?
Ah, so logs are coming through... I see, the labels aren't parsing automatically like they did with promtail... is that by design? (See image)
2021-02-11 23:43:14
{"file":"/var/log/pods/monitoring_prometheus-kube-prometheus-prometheus-0_c7bf053d-732f-4aa4-af81-a624c23c340b/otel-sidecar/30.log","kubernetes.container_image":"lightstep/opentelemetry-prometheus-sidecar","kubernetes.container_name":"otel-sidecar","kubernetes.pod_ip":"10.244.1.124","kubernetes.pod_ips[0]":"10.244.1.124","kubernetes.pod_labels.app":"prometheus","kubernetes.pod_labels.app.kubernetes.io/component":"prometheus","kubernetes.pod_labels.app.kubernetes.io/instance":"kube-prometheus","kubernetes.pod_labels.app.kubernetes.io/name":"kube-prometheus","kubernetes.pod_labels.controller-revision-hash":"prometheus-kube-prometheus-prometheus-7b9c4b5bc5","kubernetes.pod_labels.operator.prometheus.io/name":"kube-prometheus-prometheus","kubernetes.pod_labels.operator.prometheus.io/shard":"0","kubernetes.pod_labels.prometheus":"kube-prometheus-prometheus","kubernetes.pod_labels.statefulset.kubernetes.io/pod-name":"prometheus-kube-prometheus-prometheus-0","kubernetes.pod_name":"prometheus-kube-prometheus-prometheus-0","kubernetes.pod_namespace":"monitoring","kubernetes.pod_node_name":"pool-27feocvx5-31m3g","kubernetes.pod_uid":"c7bf053d-732f-4aa4-af81-a624c23c340b","message":"level=warn ts=2021-02-12T04:43:14.468Z caller=tail.go:199 component=wal_reader msg=\"waiting for write-ahead log segment\" segment=532","source_type":"kubernetes_logs","stream":"stderr"}
--
Although I am still seeing a steady stream of
Feb 12 04:47:15.010 ERROR sink{component_kind="sink" component_name=loki component_type=loki}:request{request_id=1699}: vector::sinks::util::sink: Response wasn't successful. response=Response { status: 400, version: HTTP/1.1, headers: {"content-type": "text/plain; charset=utf-8", "x-content-type-options": "nosniff", "date": "Fri, 12 Feb 2021 04:47:15 GMT", "content-length": "1486"}, body: b"entry with timestamp 2021-02-12 04:47:13.096095665 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.106832527 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.12720228 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.16749473 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.386895123 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.395875936 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.416247215 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.459859543 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.537240924 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\nentry with timestamp 2021-02-12 04:47:13.697485879 +0000 UTC ignored, reason: 'entry out of order' for stream: {forwarder=\"vector\", key=\"value\"},\ntotal ignored: 14 out of 17\n" }
(This is now also, like the log, this timestamp seems to be in UTC)
the labels aren't parsing automatically like they did with promtail... is that by design?
If I understand correctly, you want the fields in detected fields section to be in the labels? If yes then you would need to add them manually to labels
configuration.
I'm curious why there still is out-of-order error messages, in_flight_limit = 1
seems to have fixed the issue for other users. Is it possible to have racing between more than one vector instances in your setup?
labels: I see. I think I was expecting that the labels set by https://vector.dev/docs/reference/sources/kubernetes_logs/ would automatically get propagated to loki. You are saying I need to add the manually into the loki sink (e.g. 'pod_node_name').
ooo messages: I only have one instance of vector running. I have 3 kubernetes nodes, and so, three vector agents. All three report these errors. Just for completeness, here's the full vector config as it shows up in the ConfigMap:
# Configuration for vector.
# Docs: https://vector.dev/docs/
data_dir = "/vector-data-dir"
[api]
enabled = false
address = "0.0.0.0:8686"
playground = true
[log_schema]
host_key = "host"
message_key = "message"
source_type_key = "source_type"
timestamp_key = "timestamp"
# Ingest logs from Kubernetes.
[sources.kubernetes_logs]
type = "kubernetes_logs"
# Capture the metrics from the host.
[sources.host_metrics]
type = "host_metrics"
# Emit internal Vector metrics.
[sources.internal_metrics]
type = "internal_metrics"
# Expose metrics for scraping in the Prometheus format.
[sinks.prometheus_sink]
address = "0.0.0.0:9090"
inputs = ["internal_metrics", "host_metrics"]
type = "prometheus"
[sinks.kafka0]
inputs = ["kubernetes_logs"]
type = "kafka"
# General
bootstrap_servers = "kafka0-headless.kafka:9092"
compression = "gzip" # optional, default
key_field = "pod" # required
topic = "kube-logs" # required
encoding.codec = "json"
[sinks.loki]
inputs = ["kubernetes_logs"]
type = "loki"
endpoint = "http://loki.monitoring:3100"
remove_timestamp = true
encoding.codec = "json"
labels.forwarder = "vector"
labels.event = "{{ event_field }}"
labels.key = "value"
request.in_flight_limit = 1
So there are 3 vector agents and they all send to the same loki endpoint, they will race with each other, especially when they currently all use the same labels.
I think it would be fine when labels are properly added.
Would you mind openning an issue for setting label's keys from event's data, so that others can discuss it?
Thank you for your help. Does the vector agent (and the kubernetes source in particular) keep some sort of checkpoint of the logs it already processed? I think it's re-processing all the logs, so when the agent restarts, it produces a lot of 'out of order' messages.
For example, I just restarted it, and it tried to send a log with a timestamp of '2021-02-11 20:53:23.071641547 +0000 UTC' (which was yesterday).
Related to that, what should remove_timestamp
actually be doing? I thought it was supposed to remove these timestamps?
label_keys: Now that I understand better how it works, perhaps automatically converting the labels into loki ones isn't that best idea, as the cardinality will be an issue. Best to explicitly set the ones you want I think.
remove_timestamp
removes it from the JSON body when sending to loki (same effect as encoding.except_fields
). Each event still has a timestamp as loki required, but it's outside of the JSON body.
Vector should save the checkpoint to data_dir
, I'm not 100% sure. Cc @jszwedko .
I have a k8s cluster with Vector using the Loki sink.
I see this message intermittently for quite a few pods in my system:
Here is the error from the Loki process:
Here is the corresponding error from the Vector process:
I didn't know if there is some scatter-gather activity happening, and the events need to be re-sorted or what. This is for a single process, so it's not like these log events can be coming from multiple machines.. so I don't think time synchronization is an issue. (the nodes are running NTP and are within a few milliseconds anyways)
Note my config below has a few transform steps, but I was seeing this issue before I had any transforms.
Here is my config: