Open pgassmann opened 3 years ago
Related to #7336
Thanks @pgassmann . I agree this would be good to do. I updated it to an enhancement as I believe the current behavior is intentional (albeit undocumented) given the documentation also doesn't document the inverse, that it will pick up logs from before it started.
@jszwedko how are the plans to implement checkpointing and history for docker logs? We are currently experiencing issues with vector sending logs from docker to loki and because of this missing feature we are losing hours of logs.
@pgassmann unfortunately no movement on this yet. You could consider eschewing the docker_logs
source and using the file
source with the container log files. This would give you checkpointing, of course, but would be missing the container metadata that the docker_logs
source adds.
The latest release of loki/promtail has support for docker service discovery, which seems to combine service discovery through the api and reading logs from the json files. Supporting checkpointing. https://grafana.com/docs/loki/latest/clients/promtail/configuration/#docker_sd_config
That's a major selling point to move back to promtail, as currently with vector we cannot guarantee that all docker logs are transported to loki.
Checkpointing should be quite trivial to implement by using the since
option of the log api. currently it is always set to "now", but this can be set to the last known timestamp or a timestamp calulated by now - max_lookback_duration
The docker_logs
source just has to keep track of the last read timestamp by container. This can be part of the acknowledgement #7650 i.e. only update the checkpoint timestamp when the event is confirmed on the sink.
After a restart of vector, it should query for the logs of all saved container checkpoints. even if they are no longer running. (e.g. stopped after vector was stopped) but the remaining logs can still be queried from docker api.
For yet unknown containers, vector should also query for a configurable amount of time before "now". max_lookback_duration
(e.g. when a container was created/started (shortly) before vector)
cc @bruceg
@jszwedko @bruceg Can someone please look into this and give feedback to my suggestion? Vector again lost some important logs from a migration, because of this and #16806
@jszwedko @bruceg Can someone please look into this and give feedback to my suggestion? Vector again lost some important logs from a migration, because of this and #16806
Ouch, that sucks. I'm sorry to hear that you last some logs. Your suggestion for the checkpointing strategy makes sense to me. Unfortunately I don't know when exactly we would get to it, but we'd be happy to help support a PR if you (or anyone else) is motivated.
There hasn't been progress on that for two years now. Our Devs are getting frustrated, because we lose logs during maintenance windows where we reboot the hosts. We will now have to switch to a different log collector.
We now switched to promtail for collecting docker container logs. Here's our promtail configuration. you can find our full config in the ansible role: https://github.com/teamapps-org/ansible-collection-teamapps-general/tree/main/roles/promtail
scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
# filters:
# - name: name
# values: [test-container]
relabel_configs:
- source_labels: ['__meta_docker_container_label_com_docker_compose_container_number']
target_label: 'compose_container_number'
action: 'replace'
replacement: '${1}'
- source_labels: ['__meta_docker_container_label_com_docker_compose_project']
target_label: 'compose_project'
action: 'replace'
replacement: '${1}'
- source_labels: ['__meta_docker_container_label_com_docker_compose_project_working_dir']
target_label: 'compose_project_working_dir'
action: 'replace'
replacement: '${1}'
- source_labels: ['__meta_docker_container_label_com_docker_compose_oneoff']
target_label: 'compose_oneoff'
action: 'replace'
replacement: '${1}'
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: 'compose_service'
action: 'replace'
replacement: '${1}'
- source_labels: ['__meta_docker_container_id']
target_label: 'container_id'
action: 'replace'
- source_labels: ['__meta_docker_container_name']
target_label: 'container_name'
regex: '/(.*)'
action: 'replace'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'stream'
action: 'replace'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'source'
action: 'replace'
- source_labels: ['__meta_docker_container_log_stream']
target_label: 'source_type'
action: 'replace'
- target_label: 'category'
replacement: 'dockerlogs'
- target_label: 'job'
replacement: 'docker'
## Map all labels
# - action: labelmap
# regex: '__meta_docker_container_label_(.+)'
# replacement: 'container_labels_${1}'
pipeline_stages:
# combine multiline messages like stacktraces to one message,
# needs configuration in application to prefix logs with [zero-width-space char.](https://unicode-explorer.com/c/200B)
- multiline:
firstline: '^\x{200B}'
max_wait_time: 1s
- drop:
older_than: 4h
drop_counter_reason: "line_too_old"
Vector Version
Expected Behavior
Docker source has checkpointing and does not miss logs.
When starting, Vector reads all available logs and all new logs. if vector is stopped and started, the logs in the stopped time are read and sent to the configured sink.
Actual Behavior
Logs are only read from the moment when vector is started. when restarting vector, logs from the containers are missing. when rebooting and the docker containers are started before vector, the startup logs are missing.
Example Data
Additional Context
docker logs
ordocker-compose logs
and the API provide options to read previous logs.Kubernetes source has checkpointing implemented: https://vector.dev/docs/reference/configuration/sources/kubernetes_logs/#checkpointing
In #1107 in the description, there is this point:
@jszwedko This is not explicitly mentioned in the docker source documentation. https://vector.dev/docs/reference/configuration/sources/docker_logs/#how-it-works
References
Vector Configuration File