grafana / loki

Like Prometheus, but for logs.
https://grafana.com/loki
GNU Affero General Public License v3.0
23.39k stars 3.39k forks source link

if loki is not reachable and loki-docker-driver is activated, containers apps stops and cannot be stopped/killed #2361

Open badsmoke opened 4 years ago

badsmoke commented 4 years ago

Describe the bug we have installed the loki-docker-driver on all our devices. The loki server on an extra server, if the loki-server is updated/restarted or just not reachable then after a short time all containers get stuck (docker logs does not update anymore). If the loki-server is not reachable, the containers can neither be stopped/kill nor restarted.

To Reproduce Steps to reproduce the behavior:

  1. start loki server (server)
  2. install loki-docker-driver on another system (can also be tested on one and the same system) (client) 2.1. /etc/docker/daemon.json { "live-restore": true, "log-driver": "loki", "log-opts": { "loki-url": "http://loki:3100/api/prom/push", "mode": "non-blocking", "loki-batch-size": "400", "max-size": "1g" } }
  3. docker run --rm --name der-container -d debian /bin/sh -c "while true; do date >> /tmp/ts ; seq 0 1000000; sleep 1 ; done"(client)
  4. docker exec -it der-container tail -f /tmp/ts shows every second the time (client)
  5. docker logs -f der-container show numbers from 0-1000000 (client)
  6. stop loki server (server)
  7. you will see that the outputs on the system stop with the loci-driver and that you cannot stop the container (client)
  8. docker stop der-container (client)

Expected behavior A clear and concise description of what you expected to happen. I would like all containers to continue to run as desired even if the loci is not accessible. That man container can start/stop even if loki is not reachable

Environment:

Screenshots, Promtail config, or terminal output loki-docker-driver version: loki-docker-driver:master-616771a (from then on the driver option "non-blocking" is supported) loki server: 1.5.0

I am very grateful for any help, this problem has caused our whole system to collapse

jeschkies commented 2 years ago

Or we have to setup a promtail service that will do more or less the same job, and keep the default json-file logger.

Yes, no need for the Docker driver.

I think the Loki Docker logging driver will basically be considered deprecated.

The Loki Docker logging driver will not be deprecated. However, once the Docker service discovery stabilizes my personal recommendation is to use that one.

ku1ik commented 2 years ago

I've been bitten by this as well. I've found an alternative solution to this problem that seems to be working quite well: I made docker log to journal and configured journal scraping in promtail.

/etc/docker/daemon.json:

{
  "log-driver": "journald"
}

promtail.yml:

scrape_configs:
- job_name: journal
  journal:
    labels:
      job: journal
  relabel_configs:
  - source_labels: ['__journal__hostname']
    target_label: host
  - source_labels: ['__journal_priority_keyword']
    target_label: level
  - source_labels: ['__journal__systemd_unit']
    target_label: systemd_unit
  - source_labels: ['__journal_syslog_identifier']
    target_label: syslog_identifier
  - source_labels: ['__journal_container_name']
    target_label: container_name

Not only I get logs from containers (labeled with container_name) but I also get logs for systemd service units.

For me this works great but YMMV, especially if you have slightly different requirements than me.

rubytech-avsorokin commented 2 years ago

Hi!

Is it possible to consume journald with multiple instances of Promtail? Is there any performance impact or concurrency issues to be expected?

The goal is to take Promtail pipeline stage out of single host Promtail configuration to application specific configuration part. I'm not sure if multiple Promtail instances per application is a right way to do this.

It seems that Docker Loki driver plugin is made to make it possible, but I'm afraid to use it due to described issues.

srstsavage commented 2 years ago

@jeschkies Just a note, I migrated a bunch of hosts previously using loki-docker-driver to the new promtail docker_sd_configs and its working great so far. As suspected, it plays well with the optimized local log driver. I think it's expected, but the only downside I've seen so far is that very short lived containers (probably anything shorter than the refresh_interval?) don't have their logs picked up...maybe that's a use case for loki-docker-driver since it ships the logs directly to loki.

edbrk commented 2 years ago

Honestly, this is a massive problem. All my containers freeze up and i have to do a complete docker restart to be able to do anything with them if i have attempted to kill or shut them down.

feld commented 1 year ago

Hit this today. You need to stop publishing this driver immediately until this problem is solved. This is unacceptable.

crypto-titan commented 1 year ago

It's been an issue for 2 years and this driver is still published? that's just wow... seriously guys... I just wasted 3 hours debugging this utter BS - thank you for releasing such an uplifting product...

thisisjaid commented 1 year ago

Hah just to add to the pile this cost us about 3 days of debugging work as well trying to figure out why all of our damn containers were mysteriously hanging on start. Be nice to get this fixed 2 years later.

margorczynski commented 1 year ago

Hey guys, any progress on this one? I still see this happening.

daramir commented 1 year ago

The Loki Docker logging driver will not be deprecated. However, once the Docker service discovery stabilizes my personal recommendation is to use that one.

Hi @jeschkies . Do you know if it's possible to use Promtail with Docker target + sd easily on Docker Desktop (macos) which creates a vm and doesn't store log files? I'm looking for a solution that works locally and in the server. Couldn't get Promtail to discover my container logs and the docker driver is obviously broken as per #2361 issue. TIA.

jeschkies commented 1 year ago

@margorczynski @thisisjaid @Edbtvplays and @feld please see my comment from August 2021. The issue is not that we don't want to fix it. The issue is that we have to decided to retry sending and thus lock the daemon or loose data. This is also documented as a known issue. If you have an idea on how to fix it, I'm all ears.

@daramir unfortunately I don't have a Mac at hand. However, as long as you can expose the Docker Daemon API to promtail it should work. However, if you kill the VM and thus erase the logs before they've been shipped, there's little promtail can do.

MaxZubrytskyi commented 1 year ago

Hi everyone, does somebody has a working fork that has changes allowing to lose data if such occurs? Also, @jeschkies how about adding "log-opts" to lose data if loki is unavailable?

horvie commented 1 year ago

Hi, you don't need a fork. For containers where we can afford to lose logs we have added configuration as described in https://github.com/grafana/loki/issues/2361#issuecomment-718024318 and containers are stopped without a problem.

danthegoodman1 commented 1 year ago

Pretty sad this will block a rm --force too for the default loki-max-backoff of 5 minutes. Just drop that value down is my guess but I already switched over to running vector and mounting the docker logs directory to it because I don't trust this anymore. Vector wont block the docker daemon.

https://grafana.com/docs/loki/latest/clients/docker-driver/configuration/

jeschkies commented 1 year ago

@danthegoodman1

mounting the docker logs directory to it because I don't trust this anymore. Vector wont block the docker daemon.

That's what the file based discovery already does. The logging driver is really for local use cases and the Docker service discovery when you don't have the permissions to Mount the logging directory.

andoks commented 1 year ago

@jeschkies

mounting the docker logs directory to it because I don't trust this anymore. Vector wont block the docker daemon.

That's what the file besser discovery already does. The logging driver is really for local use cases and the Docker service discovery when you don't have the permissions to Mount the logging directory.

What do you mean by "That's what the file besser discovery already does"? Is there a better way of sending the logs to loki than using the docker-driver that does not risk blocking the way the docker-driver does?

jeschkies commented 1 year ago

@andoks yes. Yes, there's the service discovery or you could use file discovery or use jorunald.

pharapeti commented 10 months ago

@jeschkies @btaani

From reading through the docs and this issue, I can see there are three main solutions:

  1. Use Docker loki plugin with workaround to reduce max backoff/retries/timeout
  2. Use Promtail Docker target (not sure which Docker logging driver should be used in this case)
  3. Configure Docker daemon to use json-file or journald logging driver + Docker service discovery

Which is the officially recommended solution to use for new projects?

keesfluitman commented 4 months ago

@jeschkies @btaani

From reading through the docs and this issue, I can see there are three main solutions:

1. Use Docker loki plugin with workaround to reduce max backoff/retries/timeout

2. Use Promtail Docker target (_not sure which Docker logging driver should be used in this case_)

3. Configure Docker daemon to use `json-file` or `journald` logging driver + Docker service discovery

Which is the officially recommended solution to use for new projects?

Thanks. Haven't been able to find any working solution yet. As soon as the Loki container goes offline, Im unable to restart it or otherwise, do useful stuff with docker, and only a shutdown or powerdown command properly downs my docker and restarts. I will have to forfeit this way of gaining the docker logs. I get regular downs at night, when the loki container is somehow downed.

dtap001 commented 1 month ago

This is quite straightforwardly mentioned in deadlock section: https://grafana.com/docs/loki/latest/send-data/docker-driver/#known-issue-deadlocked-docker-daemon

danthegoodman1 commented 1 month ago

This is quite straightforwardly mentioned in deadlock section: https://grafana.com/docs/loki/latest/send-data/docker-driver/#known-issue-deadlocked-docker-daemon

When I raised the issue? Or now?

Impact123 commented 1 month ago

It was added Aug 23, 2021/Jul 10, 2023: https://github.com/grafana/loki/commit/e25587bfd7896c12cc225bf0a1d54104d8f6f0ea/https://github.com/grafana/loki/commit/02027e442c0faecabc9a96649cca3e47c6e908f2 See blame here: https://github.com/grafana/loki/blame/main/docs/sources/send-data/docker-driver/_index.md

keesfluitman commented 1 month ago

This is quite straightforwardly mentioned in deadlock section: https://grafana.com/docs/loki/latest/send-data/docker-driver/#known-issue-deadlocked-docker-daemon

I believe i tried that once. But it's been a long time.

jeschkies commented 4 weeks ago

I wonder if we should finally close this issue.

longGr commented 3 weeks ago

I have the same problem. So I guess it's still a problem. :/

jeschkies commented 3 weeks ago

I have the same problem. So I guess it's still a problem. :/

@longGr did you try one of the documented workarounds?