jenkinsci / prometheus-plugin

Jenkins Prometheus Plugin
https://plugins.jenkins.io/prometheus/
Apache License 2.0
182 stars 151 forks source link

"/prometheus" endpoint documented is not real endpoint. #594

Closed macetw closed 9 months ago

macetw commented 9 months ago

Sure, it works in a browser. Chrome takes it. But notice when you do put it in a browser, Chrome updates the URL. It adds a slash.

(note, I'm obscuring my real hostname, and using myexample.com)

Prometheus, vector, these tools are not as friendly. They interpret a 302 redirect as an error and fail to scrape the metrics. Now, an SRE must notice this error and fix the URL to add the slash. That's not helpful.

Even if the tools were allowed to handle 302 redirects, this still leads to unnecessary traffic. We should either allow it to handle the /prometheus endpoint -- or if we can't do that (because we're operating as a plugin), we should at least update all the documentation.

Jenkins and plugins versions report

Environment ```text Jenkins: 2.414.3 OS: Linux - 5.4.0-164-generic Java: 11.0.20.1 - Eclipse Adoptium (OpenJDK 64-Bit Server VM) --- ... prometheus:2.3.3 ... ```

What Operating System are you using (both controller, and any agents involved in the problem)?

Running in docker container: jenkins/jenkins:2.414.3-lts-jdk11

Reproduction steps

(I wish I could test this on the very-latest for the plugin)

1) Using Vector, scrape the endpoint "https://jenkins.myexample.com/prometheus" 2) Observe the logs receive a "302" status and gives up.

1) Using curl, do the same: curl -v https://jenkins.myexample.com/prometheus 2) Observe the logs: < HTTP/1.1 302 Found < Server: nginx/1.18.0 (Ubuntu) < Date: Tue, 28 Nov 2023 17:31:09 GMT < Content-Length: 0 < Connection: keep-alive < X-Content-Type-Options: nosniff < Location: https://jenkins.myexample.com/prometheus/

Expected Results

I want vector to just pick it up, but I need to change the endpoint to end with the slash. This will prevent the redirect step.

I don't want to use a slash. I sense like the http endpoint should just respond to both "/prometheus/" and "/prometheus"

Actual Results

(from curl)

< HTTP/1.1 302 Found < Server: nginx/1.18.0 (Ubuntu) < Date: Tue, 28 Nov 2023 17:31:09 GMT < Content-Length: 0 < Connection: keep-alive < X-Content-Type-Options: nosniff < Location: https://jenkins.myexample.com/prometheus/

(from Vector)

2023-11-28T17:30:38.798430Z ERROR source{component_kind="source" component_id=prometheus_metrics component_type=prometheus_scrape component_name=prometheus_metrics}: vector::internal_events::http_client_source: HTTP error response. url=https://jenkins.myexample.com/prometheus stage="receiving" error_type="request_failed" error_code=http_response_302 internal_log_rate_limit=true

Anything else?

It appears as if Jenkins just handles all the business of handling web traffic, and we just give it the prometheus text as getUrlName(), that our part of the work can't tell it to abstract the two endpoints, where one of them ends in a slash and the other doesn't. If that's truly the case, that there's nothing we can do, we should at LEAST update the documentation. All the examples and references to /prometheus should be changed to say /prometheus/.

Are you interested in contributing a fix?

Sure. I can update docs. But before I do that, it should at least be explored/discussed if there's a way to fix at the plugin level.

macetw commented 9 months ago

Somewhat related to this PR from 6 years ago. https://github.com/jenkinsci/prometheus-plugin/pull/13

macetw commented 9 months ago

Testing now in a way that is more direct on Jenkins (without the nginx layer), I can reproduce the same problem:

ubuntu@jenkins-master-01:~$ curl localhost:8080/prometheus -v
*   Trying 127.0.0.1:8080...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /prometheus HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.68.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< Date: Tue, 28 Nov 2023 18:20:35 GMT
< X-Content-Type-Options: nosniff
< Location: http://localhost:8080/prometheus/
< Content-Length: 0
< Server: Jetty(10.0.17)
< 
* Connection #0 to host localhost left intact
Waschndolos commented 9 months ago

I've just reproduced it on our companies Jenkins instance. I've never been in that "area of the code" - Give me some days (I promise it'll be less than #500 :) - sorry I didn't find a good solution yet)

macetw commented 9 months ago

@Waschndolos , even if you can't fix it, I'm grateful for your attention to it.

I felt, though, that this is a valuable bug that lots of SREs hit. ... and like I said, maybe it's just a matter of updating documentation.

Waschndolos commented 9 months ago

@macetw I've debugged the code and it seems it comes from Jenkins itself. Don't think I can do something about it. So I updated the documentation as you suggested. Could you check the PR at https://github.com/jenkinsci/prometheus-plugin/pull/595/files if the documentation is good enough? (I'm not a native english speaker so maybe the sentence it soo complicated?)