Closed m1keil closed 2 years ago
Hey @m1keil, this has come up before, and might be a good thing to pursue. I'm curious about a few things first though.
First, out of curiosity, did you know Promtail has a consul agent service discovery mechanism as well? This uses the API of the agent co-located on the same node as promtail. This might alleviate some of the issues you're having with the Consul catalogue API, though services still need to be registered with the agent. Is it possible to have a Nomad deployment that's not registered with consul?
Second, do you know what the current gap is between Consul and Nomad currently? What is some of the metadata you'd like to have that you're missing out on?
Finally, how does Nomad handle short-lived services? Do those not get registered with the consul agent? I'm curious to learn a bit more about how using the Nomad API would avoid missing these short-lived processes.
Yes, we are currently using the consulagent SD.
Is it possible to have a Nomad deployment that's not registered with consul?
Nomad has good integration with Consul but it's not automatic. Each Nomad client will register to Consul automatically. However, any workloads that you run must register themselves via the service{} stanza of the Nomad job. Some services might opt out from the registration if service discovery isn't required for them.
What is some of the metadata you'd like to have that you're missing out on?
Primary Nomad's metadata. With the current integration, you can get Nomad's task name and allocation ID. But you won't be able to reliably get the job name or the group's name or have the Nomad's meta{} data.
Meta{} is an interesting one in itself. Nomad includes its own metadata definition you can define on different levels (Job/Group/Task). It doesn't get passed automatically to Consul's service meta. It's an entirely different thing.
Is it possible to work around this? Yes. Is it kinda ugly? I think so :\
Finally, how does Nomad handle short-lived services?
I think the problem is that Consulagent doesn't pick up the service in time. For example, I have a small backup batch job that runs for 15 seconds every few hours. Even though the service registers in Consul, it seems like promtail doesn't detect it. I took a quick peak in the code and from what I understand, promtail is supposed to use blocking queries and theoretically, this should be detected.. but in practice it seems like something is missing.
Hi! This issue has been automatically marked as stale because it has not had any activity in the past 30 days.
We use a stalebot among other tools to help manage the state of issues in this project. A stalebot can be very useful in closing issues in a number of cases; the most common is closing issues or PRs where the original reporter has not responded.
Stalebots are also emotionless and cruel and can close issues which are still very relevant.
If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.
We regularly sort for closed issues which have a stale
label sorted by thumbs up.
We may also:
revivable
if we think it's a valid issue but isn't something we are likely
to prioritize in the future (the issue will still remain closed).keepalive
label to silence the stalebot if the issue is very common/popular/important.We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task, our sincere apologies if you find yourself at the mercy of the stalebot.
I'm interested in this too. Currently I use docker_sd_config
- example usage can be seen in this issue. But it doesn't work well with Nomad bridge and obvously it doesn't work with drivers other than docker like exec
.
Unfortunately at this time we don't have enough Nomad usage to really push this one up the priority list. If there is someone from the community running on nomad, a PR for this service discovery would be greatly appreciated. either here with the intention of up-streaming to prometheus, or to prometheus directly.
Thanks @trevorwhitney! I will check with Prometheus folks if there is an interest.
Any plans to reopen this?
Prometheus does support nomad_sd_configs
already, would be great to get the same for promtail.
@m1keil did you have any progress asking prometheus devs? I am the point where everything in my cluster works in pure Nomad except promtail + loki combination. Don't really want to bring Consul just because of that.
We would like to scrape Nomad's scheduled workloads with Promtail.
Describe the solution you'd like
Adding a new
scrape_configs
option -nomad_sd_config
, similar to the already existingkubernetes_sd_config
. This would enable promtail agent to scrape Nomad's client REST API for information about the currently running workloads on the host.Describe alternatives you've considered
static_config
(example)stratic_config
again to pinpoint the details of the log files.Additional context Using Consul discovery is our current way of doing this but it does introduce a number of shortcomings: