Support sharing registry across multiple workers (where possible)

discordianfish commented 9 years ago

Similar to https://github.com/prometheus/client_ruby/issues/9, when using the python client library on workers that get load balanced by something like gunicorn or uwsgi, each scrape it hits only one worker since they can't share state with others.

At least uwsgi supports sharing memory: http://uwsgi-docs.readthedocs.org/en/latest/SharedArea.html This should be used to share the registry across all workers. Maybe gunicorn supports something similar.

brian-brazil commented 9 years ago

Thinking on this, we'd want to implement the sharing mechanism ourselves so it could be used across any system. It will be limited to counter-based metrics.

brian-brazil commented 9 years ago

https://github.com/brian-brazil/client_python/tree/multi-process is a first pass at this, which seems to work. It only really does the countery types at the moment, and comes with a lot of caveats.

justyns commented 9 years ago

Have you come up with a solution to this yet by any chance? We're just starting to look at using prometheus in some of or python apps, and most of them are all using uwsgi with multiple workers/processes.

brian-brazil commented 9 years ago

You can see how this will work in my previous update, that works but needs cleaning up to handle all the edge cases.

justyns commented 9 years ago

I'm still looking through the code to see how everything works, but I haven't been able to get the multi-process branch you linked to to work yet.

I tried something like this:

from prometheus_client import Summary, Counter, Gauge, start_http_server, REGISTRY, CollectorRegistry, MultiProcessCollector
registry = MultiProcessCollector()
# Prometheus stats
http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'], registry=registry)

and get this error:

  File "./app/metrics.py", line 4, in <module>
    http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'], registry=registry)
  File "/usr/lib/python2.7/site-packages/prometheus_client/__init__.py", line 257, in init
    registry.register(collector)
AttributeError: 'MultiProcessCollector' object has no attribute 'register'

I did set the prometheus_multiproc_dir env variable.

Am I attempting to do this correctly by creating a new registry from MultiProcessCollector and passing that to the individual metrics?

Or is MultiProcessCollector supposed to be used automatically if the prometheus_multiproc_dir env variable is set?

I could be doing something else wrong, but I tried this originally:

from prometheus_client import Summary, Counter, Gauge, start_http_server

# Prometheus stats
http_requests = Counter('http_requests_total', 'Total Http Requests', ['method', 'endpoint', 'status_code'])
...
start_http_server(9080, '0.0.0.0')

When I tried the above, I saw the *.db shelve files being created, but when I went to port 9080, only the ProcessCollector metrics showed up.

EDIT In case it wasn't obvious, I am using http_requests.labels(request.method, request.path, resp.status_code).inc() in the code as well. I see all of the metrics if I run the flask app in a single process without the multiproc env variable instead of using uwsgi.

brian-brazil commented 9 years ago

The MultiProcessCollector() is a collector, you need to add it on it's own to a registry and use that to generate the metric text output. MultiProcessCollector() is how you get the metrics out,

justyns commented 9 years ago

Thanks for the help, I was able to get it working in one app as a test. The only caveat so far is that I had to add a command to clean out $prometheus_multiproc_dir/*.db when the service stops.

brian-brazil commented 9 years ago

Yeah, that's the intended way to use it.

brian-brazil commented 8 years ago

I've done some additional work which is up at https://github.com/prometheus/client_python/tree/multiproc

In terms of metrics, everything except gauges are supported.

brian-brazil commented 8 years ago

I've completed gauge support and added docs. Please try out the multiproc branch.

justyns commented 8 years ago

Thanks for the update @brian-brazil! I'll test it out when I get a chance soon

rvrignaud commented 8 years ago

Is this specific to gunicorn ? We are using uwsgi.

brian-brazil commented 8 years ago

It shouldn't be, I've only tested it with gunicorn.

grobie commented 8 years ago

What are the performance implications? Have you done any benchmarks?

On Tue, Oct 27, 2015 at 12:48 PM, Brian Brazil notifications@github.com wrote:

It shouldn't be, I've only tested it with gunicorn.

— Reply to this email directly or view it on GitHub https://github.com/prometheus/client_python/issues/30#issuecomment-151565192 .

brian-brazil commented 8 years ago

I haven't done any benchmarks, there's a fdatasync in there that I need to eliminate though.

taynaud commented 8 years ago

Hi,

We are trying tu use your multi process implementation with an application in flask behind uwsgi.

As I understand your example for Gunicorn, you define two functions:

worker_exit -> perform some cleaning at the end of the process
app -> expose the metrics to replace the start_http_server

I have discarded the work_exit for my poc, I use only basic metrics. I have implemented a route in my application to display the metrics:

@main.route('/metrics')
def metrics():
    registry = CollectorRegistry()
    multiprocess.MultiProcessCollector(registry)
    data = generate_latest(registry)
    response_headers = [
        ('Content-type', CONTENT_TYPE_LATEST),
        ('Content-Length', str(len(data)))
    ]
    return Response(response=data, status=200, headers=response_headers)

There is an issue in generate_latest. I get exception "_gdbm.error: [Errno 11] Resource temporarily unavailable". This is because the .db files are opened. I put a try/except around that and try to open the db as read only and it presents metric, but only those of the dead process. As long as the db stay open, I get the exception and thus cannot get metrics of current processes.

It cannot work, the next snippet present the same behavior:

>>> import shelve as s
>>> fw = s.open("./test.db")                                                                                                                                                                                                   
>>> fw["test"] = "test"
>>> fw.sync()
>>> fr = s.open("./test.db")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/shelve.py", line 243, in open
    return DbfilenameShelf(filename, flag, protocol, writeback)
  File "/usr/lib/python3.5/shelve.py", line 227, in __init__
    Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
  File "/usr/lib/python3.5/dbm/__init__.py", line 94, in open
    return mod.open(file, flag, mode)
_gdbm.error: [Errno 11] Resource temporarily unavailable
>>> fw.close()
>>> fr = s.open("./test.db")
>>> fr
<shelve.DbfilenameShelf object at 0x7fe65de82d30>

Do you have ideas how to work around that ? Am I doing something wrong ?

Best,

brian-brazil commented 8 years ago

It looks like you're using Python 3.5 and gdbm, while I tested on Python 2.7 using hashdb. It looks like this change will need some work to also work on Python 3.5.

taynaud commented 8 years ago

My last snippet works indeed with python 2.7 but not with python 3.4.3.

The documentation of shelve, in python 2 and 3 states that: "The shelve module does not support concurrent read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used."

Thus I think you should use another DB.

I have also try to replace the default shelve with https://www.jcea.es/programacion/pybsddb.htm Basically, I have replaced import shelve with import bsddb3.dbshelve as shelve

after a pip install bsddb3 since it is not in python library anymore. I also have to encode/decode the keys from str to bytes.

It seems to work, but I do not know if the documentation statement about concurrency holds.

I may provide a patch if required.

brian-brazil commented 8 years ago

That bit of documentation only applied to shelve, I was thinking we could use bsddb directly which also will give us a bit more control performance wise.

gjcarneiro commented 8 years ago

Excuse me, I am just looking at Prometheus for the first time, but my first impression is that I think this whole "sharing" design you guys are attempting is just over-engineered.

In my opinion, there should be no sharing. Instead, each gunicorn (or whatever) worker should start listening on a random port, and the prometheus server would (somehow) discover those ports and start scraping individual workers. Aggregation would then be done at a higher level.

I wish the client libraries would be allowed to register themselves with a prometheus server, to tell the server that they are alive and what is the port they can be scraped on. Hm, maybe that is what a service like Consul is all about, but I haven't looked in detail yet...

brian-brazil commented 8 years ago

That's how Prometheus generally works, but isn't compatible with gunicorn designs which tend to bring processes up/down too often for that to work well.

gjcarneiro commented 8 years ago

If Prometheus isn't compatible with this, then I think I'm better off with statsd.

But I'm not convinced! Gunicorn doesn't bring processes up and down too often. Worker processes tend to remain up for very long time.

I really think this (using Consul for discovering worker processes) has a fair chance of working, and I'm willing to invest some time to see if I can really make it work well.

brian-brazil commented 8 years ago

Worker processes tend to remain up for very long time.

From my research support for dynamically changing the number of processes such as in response to load is not uncommon. There's also the max_requests setting.

The approach we use here will be applied to other languages such as Ruby.

gjcarneiro commented 8 years ago

From the Prometheus docs:

CAUTION: Remember that every unique key-value label pair represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

So maybe you have a point. My approach of having per process metrics would require to have the pid number or some such as a label, which seems to be a against the recommendation above.

OTOH, processes sharing data via a db file is a terrible solution, as well as any WSGI-specific solution (we have not only gunicorn workers, but also celery workers, and even simple multiprocessing module subprocesses).

Unfortunately I can't think of any good, efficient, and simple solution. Here's an idea I just had, but take it more as a thought process than a finished idea:

Each worker tries to bind its well known fixed http port;
If it successfully binds, he becomes the representative of all the workers and will be scraped by Prometheus;
If it cannot bind that port, it means another worker has become representative first, so, instead, it will bind a random port, and then register with the representative worker telling it "hey, I am another sibling worker, and here's my port";
Whenever Prometheus sends an HTTP request to scrape the main worker, it knows how many "siblings" it has and so, scrapes each sibling as well. It aggregates the siblings' metrics with its own metrics, before returning them to Prom.
To allow for the representative worker to die and be replaced, as well as additional workers dying, each sibling worker will attempt to become main worker periodically: bind the fixed port and becomes main worker, or else register itself with the main worker again. The main worker removes from its list of siblings the other workers that have not registered with it in a long time.

I think this solution scales pretty well, since aggregation of data is done only at each scrape interval, not for every sample. The downside is that it is a bit complicated. :(

brian-brazil commented 8 years ago

It aggregates the siblings' metrics with its own metrics, before returning them to Prom.

This is where things fall down. How do you aggregate data from processes that no longer exist?

A file per process avoids this problem, as the files survive the process.

gjcarneiro commented 8 years ago

Well, if a process dies, the http socket to it will fail with connection refused and we ignore it. We will not aggregate data from it for the last scrape interval. So you lose a few seconds of the metric data of that worker, in this case, not a big deal. Again, I am not convinced worker processes start and die very often, they tend to remain rather stable.

brian-brazil commented 8 years ago

We will not aggregate data from it for the last scrape interval.

That would break all counters, you need that data.

gjcarneiro commented 8 years ago

So what if in an web app you don't count the last few http requests that come through one of the workers; it's not a big deal if it happens rarely. It's not like Prometheus will be used for banking transactions or anything... If it is still a problem, an atexit handler that would proactively send the last data to the elected worker could solve it. It's still better than saving a DB file every time you update a metric, that is terrible for performance...

Anyway, I should probably stop spamming this thread, I don't want to be annoying. A poor solution is better than no solution at all. If I find time to prototype something, we can reopen the discussion. It's usually easy to think of solutions that often break down when you try to code them, I know...

oliver006 commented 8 years ago

Any updates or solutions to this problem? I'm running into this issue when trying to add metrics to a flask/uwsgi python 2.7 app. Has anyone tried using e.g. the statsd exporter as a work-around?

bendemaree commented 8 years ago

@oliver006 After abandoning #70 moving to the statsd exporter is exactly what we did; works just fine!

oliver006 commented 8 years ago

Thanks Ben. I went the statsd_exporter route, deploying one locally with each of my python apps and so far that's been working well.

rud commented 7 years ago

https://github.com/korfuri/django-prometheus/blob/master/documentation/exports.md#exporting-metrics-in-a-wsgi-application-with-multiple-processes is a design for letting each worker-process listen on a distinct prometheus port. Does assume the collector will scrape all the allowed reporting ports to catch 'em all.

gjcarneiro commented 7 years ago

Cool, I quite like the django-prometheus solution!

prometheus / client_python

Support sharing registry across multiple workers (where possible) #30