prometheus / prometheus

The Prometheus monitoring system and time series database.
https://prometheus.io/
Apache License 2.0
55.96k stars 9.19k forks source link

Remote Write page in the UI #7971

Open roidelapluie opened 4 years ago

roidelapluie commented 4 years ago

Proposal

Use case. Why is this important?

I am starting with remote_write, and from the Prometheus UI, all I can see is the URL.

I'd like new remote write users like me to easily find out in the UI:

It appears that we could also have the status of the queues: starting, resharding, ...

Let's brainstorm what would be the added value for this, and gather ideas.

Note: This is not a meta-issue; the end goal is to have a remote-write page in the UI.

cc @cstyan @csmarchbanks @bwplotka @juliusv

Note: This must be implemented in the new UI

cstyan commented 4 years ago

Should be simple enough to display some of the common metrics for each queue on a page.

csmarchbanks commented 4 years ago

Detail on the sharding calculations would be interesting, right now that information is only available in debug logs.

cstyan commented 4 years ago

@roidelapluie this feels like it could be a hacktoberfest task?

roidelapluie commented 4 years ago

Yes, indeed :)

Pokom commented 4 years ago

Would this be achievable to a new contributor? I'd like to take a shot at it once there's enough requirements.

cstyan commented 4 years ago

@Pokom I would think so, I haven't done any work with any of the UI pages but it looks like the source is here, and you'd either just be pulling existing data from remote write or filing issues/opening additional PR's to make unavailable info available to the UI page.

roidelapluie commented 4 years ago

Like this @cstyan @csmarchbanks ?

queues

csmarchbanks commented 4 years ago

I like the state of each shard, that could have some good information, especially the pending samples per shard. I am curious, what do last scrape and scrape duration mean?

Generally, I think a section for information about the queue is also necessary, the whole queue is either resharding or running, not individual shards. Each shard would be running, stopping, or stopped I think.

cstyan commented 4 years ago

Might be nice to have a collapsible section for each overall queue that shows the queue config and shard information in a table?

roidelapluie commented 4 years ago

I like the state of each shard, that could have some good information, especially the pending samples per shard. I am curious, what do last scrape and scrape duration mean?

Generally, I think a section for information about the queue is also necessary, the whole queue is either resharding or running, not individual shards. Each shard would be running, stopping, or stopped I think.

Yeah should be "last sent timestamp" and "last sent duration"

csmarchbanks commented 4 years ago

Might be nice to have a collapsible section for each overall queue that shows the queue config and shard information in a table?

I would say collapse the shard information by default, but if the queue information is reasonably compact and high level then that could remain prominently displayed for a quick scan?

Edit: I think I misread, you definitely say "queue config" which would be nice to have collapsed by default as well. :+1:

strideynet commented 4 years ago

Has this task been picked up by someone?

roidelapluie commented 4 years ago

Has this task been picked up by someone?

No, it is free to pick :)

strideynet commented 4 years ago

I've picked this issue up, hoping I will have time to finish it in the next few days for review.

roidelapluie commented 4 years ago

If you want to share a mock before doing the hard work, that would be welcome. The exemple I provided will need improvements.

roidelapluie commented 4 years ago

And thanks for picking this :)

strideynet commented 4 years ago

Just to confirm, I'm needing to add a new endpoint for this data to be accessible via the API. There's no issue introducing that with the same PR that implements the web UI?

brian-brazil commented 4 years ago

It'd be unusual not to.

strideynet commented 4 years ago

First rough mock out of the UI just throwing together some react components raises a few questions. It's a rough mock but any feedback on this would be appreciated.

image

  1. For the values used as part of the sharding calculations we have two choices: we can show the values as they are currently, or as they were the last time the sharding calculation was run to calculate a desired value. Since this runs every ten seconds, it doesn't seem to make a huge difference, but we obviously need to make a choice here.

  2. Within the QueueManager there's currently no concept of a "state" for the shards and queue. I can't tell whether or not it is worth introducing this in order to service this UI? The only value I think that would add significant value is a "Resharding" status shown somewhere for the queue.

roidelapluie commented 4 years ago

That looks nice. I fell the error message could be in red like in the targets page. Also, I would expect the name of the remote read as title and the URL as part of the config displayed.

  1. I would pick the current value, but I let that to remote write maintainers
  2. The state would be either OK or error if there is an error. That would indeed only be cosmetic.
csmarchbanks commented 4 years ago

Nice work, thanks for putting this together!

First, to answer your questions:

  1. I think last calculation run is nice along with the time in which it was run, but don't have strong opinions either way.
  2. I agree with roidelapluie as to state, we just need some indicator as to whether the last request was an error or not.

A couple of additional comments/thoughts:

  1. I would like the name of the remote write and the URL in the title, otherwise if it is an unnamed queue the hash by itself is not very useful. You might already have this if "Primary Remote" is the name, but I want to make sure :)
  2. The shard info could be moved down into the calculation area, I think a more interesting headline would be the delay and status.
roidelapluie commented 4 years ago

Note: my expectation is that 99% of remote write users would have just one remote, right? That might help in the design to think about that use case first (when it comes to titles etc).

strideynet commented 4 years ago

Nice work, thanks for putting this together!

First, to answer your questions:

  1. I think last calculation run is nice along with the time in which it was run, but don't have strong opinions either way.
  2. I agree with roidelapluie as to state, we just need some indicator as to whether the last request was an error or not.

A couple of additional comments/thoughts:

  1. I would like the name of the remote write and the URL in the title, otherwise if it is an unnamed queue the hash by itself is not very useful. You might already have this if "Primary Remote" is the name, but I want to make sure :)
  2. The shard info could be moved down into the calculation area, I think a more interesting headline would be the delay and status.

Yeah, the title is currently Name and then the Endpoint. I observed the hash behaviour myself last week :)

I think I agree with your suggestion regarding the shard info since most of those figures aren't immediately eye-catching, delay + status makes a bit more sense here.

I've started with pulling the data through from the QueueManager to the API, but feel like I may need to chat with you on irc sometime in the next few days to confirm some bits since I'm not too sure on the house preferences .

I'm in the middle of preparing to transition to a new role with a new employer, so I may need to take a short break from this task.

strideynet commented 4 years ago

https://github.com/prometheus/prometheus/pull/8218

Perhaps move discussion here.

Krishna-Sivakumar commented 3 years ago

Is this issue still open? I'd like to take it up.

cstyan commented 3 years ago

Yes. Might be a good idea to take a look at the previously opened PR. https://github.com/prometheus/prometheus/pull/8218

shashank-priyadarshi commented 1 year ago

Hi @roidelapluie if this issue is still open, I can take this up!

cstyan commented 1 year ago

AFAIK no one has worked on this, so feel free to. The previous attempt at an implementation plus the above comments should help get you started.

kushalShukla-web commented 7 months ago

Hey guys i saw there is no PR in this issue and i would liked to work on it .

juliusv commented 7 months ago

Hi @kushalShukla-web, sounds good! One note: I'm currently working on a new Prometheus UI (still based on React, but using https://mantine.dev/ as a UI framework instead of Bootstrap). It might make more sense to already add this remote-write page there instead of to the old web UI.

Currently the new web UI lives only in the mantine-ui branch (https://github.com/prometheus/prometheus/tree/mantine-ui), and you can find the code for it at https://github.com/prometheus/prometheus/tree/mantine-ui/web/ui/mantine-ui. In that branch, both the old and the new UIs are built into the Prometheus binary, and you can enable the new UI by providing a feature flag:

./prometheus --enable-feature=new-ui

The new UI is still very early (for example, there is no graphing yet, only a table view), but the idea is to get it ready this year so it can become the new default UI. Then we can eventually retire the old one.

The overall goal with this new UI is to make both the code base and its dependencies more modern, and to make the UI look way less ugly and cluttered.

kushalShukla-web commented 6 months ago

hey hi @juliusv i would love to work on this new UI of prometheus , where can i find the issues for these new UI ?

juliusv commented 5 months ago

@kushalShukla-web Sorry for the late reply! We don't have specific issues for the new UI yet, but I would just consider any non-urgent UI feature request an issue for the new UI as of now :)