On demand Docker log collection

dani commented 5 months ago

Proposal

I just took a look at the memory usage on my Nomad agents, and realized that the overhead of Docker log collection is crazy. On my small scale cluster (personal install with 4 nomad agents, 65 alloc running), it was using about 33% of the total used memory (mainly the nomad logmon and nomad docker_logger processes. I used the reported used memory by systemd, with and without disable_log_collection = true).

Disabling log collection (and using for example fluentd for the docker task driver) is a solution to this insane consumption, but we loose access to the container logs from the web interface or the nomad alloc logs cli, which is convenient for quickly debugging (faster than querying a central log aggregator).

Maybe one way to mitigate this would be to have a 3rd mode for log collection which would be on-demande : as soon as the log streaming API is called, the corresponding logmon & docker_logger process could be spawned (and which would be killed after some timeout)

Use-cases

An on-demand log collection would suppress most of the memory overhead of log collection for the Docker driver, while still allowing logs to be displayed in the web interface or the nomad cli for ponctual debugging

Attempted Solutions

Using an out of band log collector/aggregator and turning disable_log_collection globaly in Nomad's agent conf is a workarround, but loosing access to the logs from the web interface is a serious drawback.

tgross commented 5 months ago

Maybe one way to mitigate this would be to have a 3rd mode for log collection which would be on-demande : as soon as the log streaming API is called, the corresponding logmon & docker_logger process could be spawned (and which would be killed after some timeout)

This is a clever idea, but the challenge is that the the logmon/docker_logger are just attaching to stdout/stderr of the container. If nothing is reading from those file handles, then the application will not be able to write those logs (potentially causing the entire application to block, but at the very least buffering up a ton of logs). Likewise, we need to attach the logmon so that we can rotate logs safely without dropping any, otherwise a given task can use more than the allowed disk space.

The long-term approach we want to take to this is logging plugins. A design doc from a hack branch I did of this can be found here. A couple of other thoughts along those lines:

One of the logging plugins not listed there would be a journald logger, where we're write logs directly to the journal and let the journal's own rate limiting take over.
If we could properly enforce disk quotas (ex. the alloc dir was a loopback device or something like that), then we could allow tasks to opt out of Nomad-managed log rotation and that'd keep all the existing features.

In any event, I'll label this as another logging-related idea and we'll look into this when we return to that logging plugin concept. Thanks!

dani commented 5 months ago

As a workarround, I'm now using the fluentd logging driver, sent to a local vector instance, which write back the logs where Nomad expects (as described here ). On a small scale test cluster, this setup reduced by ~25GB (about 20% of the total) the global memory consumption, while still allowing access to logs with the Nomad API

apollo13 commented 4 months ago

Ha, thank you @dani. I am also capturing docker logs via the splunk exporter into vector (have to see if fluentd would be better), but never thought of writing them back to the nomad locations.

apollo13 commented 4 months ago

Fwiw, instead of using env variables you can also use the docker labels, so my docker plugin config looks like this:

        extra_labels = ["*"]
        logging {
            type = "splunk"
            config {
                splunk-token = "localhost-splunk-token"
                splunk-url = "http://127.0.0.1:8089"
                splunk-verify-connection = "false"
                labels-regex = "com\\.hashicorp\\..*"
            }
        }

and the vector configuration looks like this:

sinks:
  loki:
    type: loki
    inputs:
      - splunk
    endpoint: http://localhost:3100
    encoding:
      codec: text
    healthcheck:
      enabled: false
    labels:
      nomad_namespace: '{{ attrs."com.hashicorp.nomad.namespace" }}'
      nomad_job: '{{ attrs."com.hashicorp.nomad.job_name" }}'
      nomad_group: '{{ attrs."com.hashicorp.nomad.task_group_name" }}'
      nomad_task: '{{ attrs."com.hashicorp.nomad.task_name" }}'
      nomad_node: '{{ attrs."com.hashicorp.nomad.node_name" }}'
      nomad_alloc: '{{ attrs."com.hashicorp.nomad.alloc_id" }}'
      host: "${HOSTNAME}"
      log: "nomad"

At least that is the part that passes the data into loki, but you can see how to access the labels again via attrs

dani commented 4 months ago

Indeed, I could've done this. But in my case, I also have some tasks which sends directly their logs to the same fluentd source, and only has access to the env var, not the labels. So using env everywhere allows the same vector pipeline to be used

hashicorp / nomad