graphite-project / graphite-web

A highly scalable real-time graphing system
http://graphite.readthedocs.org/
Apache License 2.0
5.88k stars 1.26k forks source link

[BUG] #2742

Closed jhagg closed 2 years ago

jhagg commented 2 years ago

Describe the bug Graphite Browser does not answer and causes a timeout

To Reproduce I'm using icinga2 (2.13.2) with graphite-carbon as storage.

When I connect directly to HOST:8000 (the port for the apache WSGI-process), I get the graphite graphics, menu and stuff. Clicking on 'Metrics' takes a long time, sometimes minutes, but it works. Drilling down always hangs somewhere though, but not always in the same place. It feels like a timeout.

Grafana have the same problem when it tries to fetch data from graphite-carbon

Updating data works fine, it's just fetching data that hangs.

(This is an installation that has been running fine a long time.)

Expected behavior Quick responce like it always does. :-)

Screenshots n/a

Environment (please complete the following information):

Additional context I'm running Debian sid, is the problem the version of django or python3?

deniszh commented 2 years ago

Default find and fetch timeout is 10 seconds. You can increase it using FIND_TIMEOUT and FETCH_TIMEOUT variables in local_settings.py. Uncomment and increase them to e.g. 30 seconds and restart graphite-web. But please note that will not improve speed, though, but will you will get answer. It's hard to say from symptoms above why it's timeouting. If it's because of load or number of metrics or your instance is overloaded - then you need to scale it up, but I would recommend to start with replacing carbon with go-carbon.

deniszh commented 2 years ago

Ah, you already using go-carbon. Well, then it depends, unfortunately. Anyway, try to increase timeout and check Graphite and go-carbon logs, overall system health etc.

jhagg commented 2 years ago

Hmm, I would need a timeout > 2-3 minutes.

The strange thing is that this started quite recently. It has been running for a long time without a problem.

So I was wondering if perhaps the version of python or django could be the problem. (Downgrade all python packages would be painful so I was hoping it was something simpler... :-)

By the way, I don't use go-carbon. Maybe that could be a workaround.

jhagg commented 2 years ago

Hmm, it seems as if go-carbon is just doing the data storing. My problem lies in the web frontend (web sockets and stuff).

Or have I misunderstood?

deniszh commented 2 years ago

So I was wondering if perhaps the version of python or django could be the problem.

Sorry, I see no way how version of python or django can cause that. Also, why version of python or django was suddenly changed?

By the way, I don't use go-carbon. Maybe that could be a workaround.

Then you can try to replace carbon with it, it's usually faster. If your browser hangs on getting list of metrics I can only imagine that you have really big number of metrics. If it's not the case - then it's probably something else.

My problem lies in the web frontend (web sockets and stuff).

Why do you think so? Maybe I misunderstood your issue. Maybe you can record screencast e.g. with https://www.flexclip.com/tools/screen-recorder/ and share video?

jhagg commented 2 years ago

I've traced down my problem to the graphite api below. This is where Grafana gets its data and also (I assume) graphite-web. I noticed this problem when Grafana suddenly timed out, and I thought it was a Grafana-issue, but then the Graphite browser had the same problem.

I don't have particulary many metrics, about 15000 whisper files, so it shouldn't be a problem. And I'm sorry, I didn't mean web sockets, I meant the WSGI-api below.

By the way, can go-carbon replace this api also? I thought it was only for saving data, but if I can read data through go-carbon, it probably would solve my problem.

Since I use Debian sid (due to historical reasons), python and django follows Debian development like a continous upgrade. Yeah, I know, maybe not the best solution for this application, it just happened. |-)
(Unfortunately, I can't record a video since it might show company data.)



<VirtualHost *:8000>
        ServerName xxx.example.com

        WSGIDaemonProcess _graphite processes=5 threads=5 display-name='%{GROUP}' inactivity-timeout=120 user=_graphite group=_graphite
        WSGIProcessGroup _graphite
        WSGIImportScript /usr/share/graphite-web/graphite.wsgi process-group=_graphite application-group=%{GLOBAL}
        WSGIScriptAlias / /usr/share/graphite-web/graphite.wsgi

        Alias /static/ /usr/share/graphite-web/static/
        <Location "/static/">
                SetHandler None
        </Location>
        ErrorLog /var/log/apache2/graphite-web_error.log
        CustomLog /var/log/apache2/graphite-web_access.log combined

</VirtualHost>
deniszh commented 2 years ago

@jhagg : I understand, but unfortunately Graphite is complex multicomponent system and it's hard to say what can cause that issues. I would recommend to check graphite logs (it's in /opt/graphite/storage//log/webapp by default) first, default log level is INFO and it should give you some hints what's going on. 15000 metrics doesn't looks like a lot, though. Go-graphite has also component to replace graphite-web named carbonapi. You can look e.g. https://github.com/go-graphite/docker-go-graphite to check how to setup it together with go-carbon.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.