prometheus-pve / prometheus-pve-exporter

Exposes information gathered from Proxmox VE cluster for use by the Prometheus monitoring system
Apache License 2.0
840 stars 99 forks source link

self.handle_request(req, conn): OSError: [Errno 9] Bad file descriptor #293

Open grinapo opened 2 weeks ago

grinapo commented 2 weeks ago

After upgrading from a pretty old version the exporter started to die repeatedly with

2024-10-25_15:03:38.94501 [2024-10-25 17:03:38 +0200] [1773730] [ERROR] Socket error processing request.
2024-10-25_15:03:38.94506 Traceback (most recent call last):
2024-10-25_15:03:38.94507   File "/home/prox/.local/pipx/venvs/prometheus-pve-exporter/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 285, in handle
2024-10-25_15:03:38.94508     keepalive = self.handle_request(req, conn)
2024-10-25_15:03:38.94508                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-10-25_15:03:38.94509   File "/home/prox/.local/pipx/venvs/prometheus-pve-exporter/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 357, in handle_request
2024-10-25_15:03:38.94510     util.reraise(*sys.exc_info())
2024-10-25_15:03:38.94511   File "/home/prox/.local/pipx/venvs/prometheus-pve-exporter/lib/python3.11/site-packages/gunicorn/util.py", line 641, in reraise
2024-10-25_15:03:38.94512     raise value
2024-10-25_15:03:38.94513   File "/home/prox/.local/pipx/venvs/prometheus-pve-exporter/lib/python3.11/site-packages/gunicorn/workers/gthread.py", line 343, in handle_request
2024-10-25_15:03:38.94513     resp.write(item)
2024-10-25_15:03:38.94514   File "/home/prox/.local/pipx/venvs/prometheus-pve-exporter/lib/python3.11/site-packages/gunicorn/http/wsgi.py", line 326, in write
2024-10-25_15:03:38.94515     self.send_headers()
2024-10-25_15:03:38.94516   File "/home/prox/.local/pipx/venvs/prometheus-pve-exporter/lib/python3.11/site-packages/gunicorn/http/wsgi.py", line 322, in send_headers
2024-10-25_15:03:38.94521     util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
2024-10-25_15:03:38.94521   File "/home/prox/.local/pipx/venvs/prometheus-pve-exporter/lib/python3.11/site-packages/gunicorn/util.py", line 299, in write
2024-10-25_15:03:38.94522     sock.sendall(data)
2024-10-25_15:03:38.94523 OSError: [Errno 9] Bad file descriptor

Seems it's related to gunicorn and its worker configuration, but I am not familiar with its internals. Nor am I aware how it could be configured from outside, so I started to patch cli.py until it worked (based on various discussions like https://github.com/benoitc/gunicorn/issues/1877):

    gunicorn_options = {
        'bind': f'{params.web_listen_address}',
        'threads': 2,
        'keyfile': params.server_keyfile,
        'certfile': params.server_certfile,
        'loop': 'asyncio',
        # sync, gthread, (eventlet)gevent, tornado
        'worker_class': 'gevent',
    }

(Obviously this requires gevent as a dependency.) This works for me but I am not sure this is a proper solution of the problem.

It's been started under runit here (aka. daemontools next gen), but otherwise it's a pipx install with no additional modifications.

znerol commented 1 week ago

Thanks for the report. I haven't found any of those in my logs yet.

That said, I'm not too happy on how the application hard codes gunicorn internals. On the other hand, I hesitate to expose all of that as configuration, since it shouldn't be necessary to tweak this stuff in order to reliably run this service.

Maybe there is a way to refactor the code such that pve exporter can be run as a wsgi app. That would make it easier to swap out the wsgi server when needed (or adapt its configuration).

grinapo commented 1 week ago

It is running under apparmor and cgroup and various other evil magic, so it is very well possible that the error will not manifest for you. Googling that revealed that it is known but usually rare, and related to gunicorn behaving.

znerol commented 1 week ago

Ok, good to know.