hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.81k stars 1.94k forks source link

configure default log rotation #23709

Open msherman13 opened 1 month ago

msherman13 commented 1 month ago

Proposal

Currently, the logs config block for alloc logs rotation must be configured on a per-task basis. This can be quite cumbersome. Requesting the ability to change the defaults in the client config.

Use-cases

When running promtail as a system job, if a job dumps a large amount of logs all at once (i.e. more than 10 * 10MB), the files willl quickly get rotated out with the default settings and promtail completely misses some log lines.

Attempted Solutions

Configuring the logs block for every task, although this is quite a lot of redundant configuration and is error-prone

tgross commented 1 month ago

Hi @msherman13! This is definitely something we've talked about in our various discussions of logging plugins, external log shipping, etc. One challenge is the way we account for resources in the scheduler. Currently, we calculate the amount of CPU, memory, disk, etc. resources needed and then "feasibility check" that against a set of nodes (that set depends on the scheduler type). Having this value be a client configuration would mean we'd need to alter the resources for each node we check. It also makes for strange failure modes where the same job could be placed on one node but then the client configuration could be changed out from under it. This is especially bad if the allocation is rescheduled because it's then possible to have no available nodes.

All that being said, having some kind of control over this at a level higher than task seems like it'd be a good idea! One idea that comes to mind is a default logging configuration in the server configuration. When you submit a job we have a series of "job mutating hooks" that change the jobspec in various ways (add sidecar tasks for connect, add identity blocks, set default fields, etc.). So when we submit the job we could task.logging configuration with a value from the server, and that value will be preserved across all allocations for the same job without messing with the scheduler logic.

I'm going to mark this issue for further discussion and roadmapping. In the meantime, some users have found that nomad-pack is a good way to have job specs with common snippets.