Open raxod502 opened 2 years ago
Hey, thanks for flagging this! Do you have a more complete (but minimal) example I could turn into an integration test to ensure this works as expected?
Sure, I will put something together and get back to you soon.
Here's a working example: https://paste.sr.ht/~raxod502/e7527b5868cdee04160f457870299ff33ce0c1ed
% poetry run flask run
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000 (Press CTRL+C to quit)
/home/raxod502/.cache/pypoetry/virtualenvs/prometheus-flask-exporter-issue-135-_O3CvxEA-py3.10/lib/python3.10/site-packages/prometheus_flask_exporter/__init__.py:322: Warning: Silently ignoring app.run() because the application is run from the flask command line executable. Consider putting app.run() behind an if __name__ == "__main__" guard to silence this warning.
app.run(host=host, port=port)
Now that I've had a moment to review the source, I can confirm that simply adding this code before invoking metrics.start_http_server
works around the problem, and allows the metrics server to be started without issue:
os.environ.pop("FLASK_RUN_FROM_CLI")
However, when running in development mode (with the live reloader) we encounter an additional problem, because the metrics Flask app tries to start its own live reloader from inside the first live reloader, which does not work since the live reloader establishes signal handlers and therefore can only run from the main thread:
% FLASK_ENV=development poetry run flask run
* Environment: development
* Debug mode: on
* Serving Flask app 'prometheus-flask-exporter-8081' (lazy loading)
* Environment: development
* Debug mode: on
* Running on http://127.0.0.1:8081 (Press CTRL+C to quit)
* Restarting with stat
Exception in thread Thread-1 (run_app):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1009, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 946, in run
self._target(*self._args, **self._kwargs)
File "/home/raxod502/.cache/pypoetry/virtualenvs/prometheus-flask-exporter-issue-135-_O3CvxEA-py3.10/lib/python3.10/site-packages/prometheus_flask_exporter/__init__.py", line 322, in run_app
app.run(host=host, port=port)
File "/home/raxod502/.cache/pypoetry/virtualenvs/prometheus-flask-exporter-issue-135-_O3CvxEA-py3.10/lib/python3.10/site-packages/flask/app.py", line 920, in run
run_simple(t.cast(str, host), port, self, **options)
File "/home/raxod502/.cache/pypoetry/virtualenvs/prometheus-flask-exporter-issue-135-_O3CvxEA-py3.10/lib/python3.10/site-packages/werkzeug/serving.py", line 1083, in run_simple
run_with_reloader(
File "/home/raxod502/.cache/pypoetry/virtualenvs/prometheus-flask-exporter-issue-135-_O3CvxEA-py3.10/lib/python3.10/site-packages/werkzeug/_reloader.py", line 427, in run_with_reloader
signal.signal(signal.SIGTERM, lambda *args: sys.exit(0))
File "/usr/lib/python3.10/signal.py", line 56, in signal
handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread of the main interpreter
* Debugger is active!
* Debugger PIN: 788-973-912
It appears to be possible to work around this issue by unsetting FLASK_ENV
from Python as well, before starting the metrics server. However, this does have the disadvantage of producing relatively confusing log output:
% FLASK_ENV=development poetry run flask run
* Environment: development
* Debug mode: on
* Serving Flask app 'prometheus-flask-exporter-8081' (lazy loading)
* Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:8081 (Press CTRL+C to quit)
* Running on http://127.0.0.1:5000 (Press CTRL+C to quit)
* Restarting with stat
Working code:
import os
import flask
import prometheus_flask_exporter
app = flask.Flask(__name__)
os.environ.pop("FLASK_RUN_FROM_CLI")
os.environ.pop("FLASK_ENV")
metrics = prometheus_flask_exporter.PrometheusMetrics(app)
metrics.start_http_server(8081, host="127.0.0.1")
Thanks a lot for the investigation @raxod502 ! 🙇 Do you have a proposed way forward based on that? I expect if we unset those environment variables then we'd interfere with other parts of Flask that would handle those?
Yeah... and environment variables are shared globally for the whole process, so messing with them is really nasty. Let's file an issue upstream: https://github.com/pallets/flask/issues/4616
If a keyword argument is added to app.run()
as I suggest in that issue report, then we can set that keyword argument from prometheus_flask_exporter, and if we additionally set debug=False
in prometheus_flask_exporter, that should fully resolve this problem.
This metrics server does not appear to be part of the application under development. It is its own, separate application. Therefore, it should be run using a production server, even locally. Production servers include Gunicorn, Waitress, uWSGI, mod_wsgi, Eventlet, and Gevent.
OK, sounds like the way to resolve this issue would be to modify prometheus_flask_exporter to use Gunicorn or an equivalent framework by default, since the official upstream recommendation is that prometheus_flask_exporter is not an appropriate use case for the Flask development server. Does that seem like an appropriate course of action to you, @rycus86?
I'll have a look at this more deeply when I get a chance. I wouldn't necessarily want Gunicorn or other dependencies to be mandatory, so I'll need to dig deeper into what am I doing wrong in the code that surfaces this problem (sounds like maybe I use the wrong app somewhere perhaps).
OK, so I had a chance now to look at the code.
So flask run
passes an environment variable to the app so that we don't actually start an app from within, because the cli machinery is going to set that development server up for us.
If this is intended for development mode only, I think it should be OK to just have the metrics endpoint exposed on the main Flask app, which happens by default.
import flask
import prometheus_flask_exporter
app = flask.Flask(__name__)
metrics = prometheus_flask_exporter.PrometheusMetrics(app)
@app.route('/test')
def index():
return 'Hello world'
If you flask run
this, then you'll get the app endpoints and the metrics endpoint exposed on the same app that belongs to the (single) development server, e.g.:
curl localhost:5000/test
returns Hello world
, andcurl localhost:5000/metrics
returns the Prometheus metricsDoes this work for you @raxod502 ?
I think the best course of action from here is document this fact and/or make this library raise an exception when we're in flask run
mode and someone calls the start_http_server
that we (now) know won't work.
That works for development mode, but what about in production? It sounds like the Flask folks are saying it is not appropriate to use the Flask development server in production, yet it seems like prometheus_flask_exporter does not have an option to use anything other than the Flask development server.
What if we add an API to prometheus_flask_exporter that simply returns the Flask app for exposing metrics, so that the user may choose how they wish to serve that application (e.g. by adding a dependency on Gunicorn or a similar WGSI package)?
We do have examples on how to integrate with "production ready" servers, for example:
One thing to note: for the non-internal multiprocessing enabled Prometheus exporter wrappers, we use the prometheus_client.start_http_server
function from the Prometheus client library, see https://github.com/rycus86/prometheus_flask_exporter/blob/master/prometheus_flask_exporter/multiprocess.py#L78
What if we add an API to prometheus_flask_exporter that simply returns the Flask app for exposing metrics, so that the user may choose how they wish to serve that application (e.g. by adding a dependency on Gunicorn or a similar WGSI package)?
We could perhaps expose the contents of https://github.com/rycus86/prometheus_flask_exporter/blob/master/prometheus_flask_exporter/__init__.py#L272 as a function that you can then call from an endpoint you supply in whatever way is appropriate - if that's desirable, we can look into that.
for the non-internal multiprocessing enabled Prometheus exporter wrappers, we use the
prometheus_client.start_http_server
function from the Prometheus client library
Ah, got it. That is indeed what I was looking for. I didn't read this part of the documentation at first because it was labeled for multiprocess-aware handling, which wasn't something I needed in my application. Should we migrate to using this function in all cases where the user is placing the metrics endpoint on a different port, so that we eliminate all use of the Flask development server except insofar as the user is running it themselves? I think that would resolve the issues that upstream has raised.
I just looked quickly at prometheus_client.start_http_server
, and you should not use that either. It should be reported as an issue to prometheus_client, unless they've already indicated that that is for development only. It uses wsgiref.simple_server
, which is based on http.server
just like Werkzeug's development server is, so it is not secure and should not be used in production.
They already have different (more robust?) ways to expose metrics over HTTP, see https://github.com/prometheus/client_python#http
so it is not secure and should not be used in production
Maybe this is worth calling out in this project's README, and suggest some alternatives - though on that note I think if you run the metrics endpoint on an insecure server that is not available for external users, it may be something that is an acceptable tradeoff for some. I liked @raxod502 's suggestion to package up the metrics exposure code as a function so that people can bring their own (more secure) servers to serve it with, and perhaps here we can explain in the README that it's more recommended than the other alternative.
Yeah - although the development server is not "production ready", it may be less bad to use it for an internal metrics endpoint that will be seeing <0.1 request per second, and is not accessible to malicious outside traffic. But, if it's not production ready, we should still do our best to follow best practices and use something that is recommended instead.
OK, so I added PrometheusMetrics.generate_metrics()
in https://github.com/rycus86/prometheus_flask_exporter/commit/3d29e0431975b010df04c789571500662ebb0fd8 but as I was about to write an integration test, I realized I'm not really sure how that's supposed to work.
You'd only want to do this yourself if you want to expose the metrics endpoint on a different server or different HTTP port, and not sure what has support for this in the same Python process (couldn't find Gunicorn/uwsgi supporting this, maybe haven't looked too closely).
If it's not the same Python process, would that even work? I suppose with multiprocessing writing data into PROMETHEUS_MULTIPROC_DIR
directory, then the second webserver could maybe read stuff from there, but I'm not a 100% sure that works as expected.
If we do think that's how it's supposed to work, I can look into adding an integration test somehow to replicate that, but I'm not absolutely convinced.
Yeah, I would assume that you would want to expose the second webserver from within the same process. Looking into the options in https://github.com/rycus86/prometheus_flask_exporter/issues/135#issuecomment-1143929020, it seems like Waitress, Eventlet, and Gevent are acceptable options.
Since the Flask project doesn't want to support usage of the Flask development server even for internal facing metrics, I don't think we have any option other than to use a pure-Python WSGI server implementation to produce metrics from in-process. Then of course in production we must also use the existing techniques in prometheus_flask_exporter to ensure metrics are shared between all worker processes.
Seems to me like it's either that or abandon the idea of serving metrics on a separate port.
You should alternatively be able to set app.run
's use_reloader
option to False
like so, app.run(debug=True, use_reloader=False)
.
Well, sure, but only if we feel comfortable ignoring guidance from upstream that the Flask dev server (which is what gets run when you use app.run
) should not be used in this situation - right?
My suggestion was an alternative to os.environ.pop("FLASK_ENV")
. You should still use a Production server in production. But the Flask dev server is fine in development.
In my case, I'm doing the following
os.environ.pop("FLASK_RUN_FROM_CLI")
app.run(debug=False, use_reloader=False)
But the Flask dev server is fine in development
I agree with you, however, the comment from @davidism says this:
This metrics server does not appear to be part of the application under development. It is its own, separate application. Therefore, it should be run using a production server, even locally
Which I'm interpreting to mean that using the Flask development server for something that is not the main Flask app, even in development, is not a supported use case by upstream. I have to confess I am not sure why, but if that's what we have to work with, it's what we have to work with.
Gotcha. In that case, how hard would it be to make the metrics server "part of the application under development"? Maybe something like, Run Multiple Flask Applications from the Same Server. Granted, I haven't looked into this codebase to know how applicable that will be.
My own reason for encountering the error discussed in this issue is that I needed to spawn a temporary server on a random open port to check a user's authentication. This temp server can run independently of the Flask app itself. I.e., to run the service normally, flask run
. And to kick off an authentication job (whether or not the normal service is running) flask auth
.
Here's a trimmed-down version of my code for reference:
from flask import Flask
from signal import pthread_kill, SIGTSTP
from threading import Thread
import requests
app = Flask(__name__)
def find_free_port():
# implementation removed for brevity
return 5643
@app.cli.command("auth")
def auth(name):
port = find_free_port()
kwargs = {"port": port, "debug": False, "use_reloader": False}
thread = Thread(target=app.run, kwargs=kwargs)
thread.start()
url = f"http://localhost:{port}/auth"
r = requests.get(url)
refresh_token = r.json()["token"].get("refresh_token"):
print(f"refresh token is {refresh_token}")
pthread_kill(thread.ident, SIGTSTP)
how hard would it be to make the metrics server "part of the application under development"?
I am not sure this is possible.
Maybe something like, Run Multiple Flask Applications from the Same Server
That is for doing routing within a single WSGI application to proxy to multiple sub-applications by URL path. What we want is to run multiple top-level WSGI applications separately.
As far as I can tell, what the Flask project is saying is you can't use ~Flask~ app.run
for this.
how hard would it be to make the metrics server "part of the application under development"?
Does this help? https://github.com/yrro/hitron-exporter/blob/4add40236f52042c381343fb3a696a6dead6cab0/hitron_exporter/__init__.py#L41
(I'm investigating an issue whereby /metrics
gives me a 404 error when I run my app via flask run
and the above solution appears to work for me...)
[edit] to clarify, when I run flask routes
I do see a route for /metrics
even though actually accessing it gives me a 404 error.
I had a look at your code and checked it with a minimal example:
from flask import Flask
from prometheus_client import make_wsgi_app
from prometheus_flask_exporter import PrometheusMetrics
app = Flask(__name__)
metrics = PrometheusMetrics(app)
metrics_wsgi_app = make_wsgi_app()
@app.get('/info')
def info():
import os
return {'response': 'ok', 'env': dict(os.environ)}
@app.get('/metrics')
@metrics.do_not_track()
def metrics():
return metrics_wsgi_app
if __name__ == '__main__':
app.run('0.0.0.0', 5000, debug=True, use_reloader=True)
This works with flask run
with any combinations of --debug
and/or --reload
being present.
When I checked how to make it work by default, I noticed https://github.com/rycus86/prometheus_flask_exporter/issues/4 that has tried working the opposite way. 😅
I also checked that just having DEBUG_METRICS=true
enabled has the same effect, we'll let Flask add the built-in metrics endpoint on the app under development (like your solution if I understand correctly).
edit: I've pushed the example code here: https://github.com/rycus86/prometheus_flask_exporter/tree/master/examples/flask-run-135
An awkward situation. Code:
Result:
And the metrics server does not listen on port 8081. It seems that Flask notices that I have invoked the main application using
flask run
, and has elected to disable the ability to run new Flask applications, which obviously breaksstart_http_server
from prometheus_flask_exporter.Not sure what the best way to handle this is, but wanted to flag.