graylog-labs / graylog2-web-interface

[DEPRECATED]
https://www.graylog.org/
611 stars 174 forks source link

Dashboard instant Gateway Timeout #1679

Open gruselglatz opened 9 years ago

gruselglatz commented 9 years ago

Hi,

Dashboards with many queries over a long time period will lead to an instant Gateway-Timeout until the queries are done.

only log information i get:

2015-11-10 15:25:35,016 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in pool-38-thread-1
Connection refused: /192.168.100.20:12900

2015-11-10 15:25:37,980 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

2015-11-10 15:25:42,981 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

2015-11-10 15:25:47,982 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

2015-11-10 15:25:52,983 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

When the Dashboard gets opened the Server load is High but it is responding normaly. Http timeouts are changed to 30s

I cant find a Gateway-Timeout to configure, it looks like there is no Timeout, bcs its popping up instantly.

Update: The Widgets only shows N/A and as error that the API-Call failed, after 2 Seconds it says That the Gateway timed out, and after a Page Refresh the Widgets are filled with Data.

Is there some Option to set the Widget timeout or Web-Interface to Server timeout?

Would it be helpful if you didn't trigger all Widgets at the same time? I have no Problem with waiting a few seconds with a rolling symbol than getting 20 Gateway Timeout messages or even get thrown out to the /disconnect page...

Update 2: If you ran into the same issueslb_recognition_period_seconds = 0does the job. The Web Interface then will wait for the Widget to get loaded.

BUT the problem of getting thrown to /disconnect page stays.

Update 3: I ran into the same Problem every time, here are some new Stacktraces and my Server config:

2015-11-13 07:32:51,347 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in pool-234-thread-1
Connection refused: /192.168.100.20:12900

2015-11-13 07:32:52,195 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

2015-11-13 07:32:52,399 - [INFO] - from play in Thread-5
Shutdown application default Akka system.

2015-11-13 07:33:01,485 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in main
Connection refused: /127.0.0.1:12900

2015-11-13 07:33:01,537 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

2015-11-13 07:33:01,600 - [INFO] - from play in main
Application started (Prod)

2015-11-13 07:33:01,687 - [INFO] - from play in main
Listening for HTTP on /127.0.0.1:9000

2015-11-13 07:33:06,546 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

2015-11-13 07:33:10,774 - [INFO] - from play in New I/O worker #18
Starting application default Akka system.

2015-11-13 07:33:11,549 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

2015-11-13 07:33:16,565 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
Connection refused: /127.0.0.1:12900

2015-11-13 07:34:26,696 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in pool-11-thread-1
API call Interrupted
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1039) ~[na:1.8.0_60]
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) ~[na:1.8.0_60]
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) ~[na:1.8.0_60]
        at com.ning.http.client.providers.netty.future.NettyResponseFuture.get(NettyResponseFuture.java:158) ~[com.ning.async-http-client-1.9.31.jar:na]
        at org.graylog2.restclient.lib.ApiClientImpl$ApiRequestBuilder.executeOnAll(ApiClientImpl.java:608) ~[org.graylog2.graylog2-rest-client--1.2.2-1.2.2.jar:na]
        at controllers.api.MetricsController$PollingJob.run(MetricsController.java:117) [graylog-web-interface.graylog-web-interface-1.2.2.jar:1.2.2]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_60]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_60]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

2015-11-13 07:34:39,084 - [ERROR] - from org.graylog2.restclient.lib.ApiClient in servernodes-refresh-0
API call timed out
java.util.concurrent.TimeoutException: null
        at com.ning.http.client.providers.netty.future.NettyResponseFuture.get(NettyResponseFuture.java:159) ~[com.ning.async-http-client-1.9.31.jar:na]
        at org.graylog2.restclient.lib.ApiClientImpl$ApiRequestBuilder.executeOnAll(ApiClientImpl.java:608) ~[org.graylog2.graylog2-rest-client--1.2.2-1.2.2.jar:na]
        at org.graylog2.restclient.lib.ServerNodesRefreshService.resolveConfiguredNodes(ServerNodesRefreshService.java:97) [org.graylog2.graylog2-rest-client--1.2.2-1.2.2.jar:na]
        at org.graylog2.restclient.lib.ServerNodesRefreshService.access$400(ServerNodesRefreshService.java:42) [org.graylog2.graylog2-rest-client--1.2.2-1.2.2.jar:na]
        at org.graylog2.restclient.lib.ServerNodesRefreshService$1.run(ServerNodesRefreshService.java:126) [org.graylog2.graylog2-rest-client--1.2.2-1.2.2.jar:na]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_60]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [na:1.8.0_60]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_60]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [na:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_60]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]

Config:

is_master = true
node_id_file = /var/opt/graylog/graylog-server-node-id
root_timezone = Europe/Vienna
plugin_dir = /opt/graylog/plugin
rotation_strategy = time
elasticsearch_max_size_per_index = 1073741824
elasticsearch_max_time_per_index = 24h
elasticsearch_max_number_of_indices = 45
retention_strategy = delete
elasticsearch_shards = 4
elasticsearch_replicas = 1
elasticsearch_index_prefix = graylog
allow_leading_wildcard_searches = true
allow_highlighting = false
elasticsearch_cluster_name = graylog2
elasticsearch_http_enabled = false
elasticsearch_discovery_zen_ping_unicast_hosts = 127.0.0.1:9300
elasticsearch_cluster_discovery_timeout = 30000
elasticsearch_discovery_initial_state_timeout = 3s
elasticsearch_analyzer = standard
elasticsearch_request_timeout = 2m
output_batch_size = 500
output_flush_interval = 1
output_fault_count_threshold = 5
output_fault_penalty_seconds = 30
processbuffer_processors = 5
outputbuffer_processors = 3
processor_wait_strategy = blocking
ring_size = 65536
inputbuffer_ring_size = 65536
inputbuffer_processors = 2
inputbuffer_wait_strategy = blocking
message_journal_enabled = true
message_journal_dir = /var/opt/graylog/data/journal
message_journal_max_size = 1gb
async_eventbus_processors = 2
dead_letters_enabled = false
lb_recognition_period_seconds = 0
alert_check_interval = 60
mongodb_max_connections = 100
mongodb_threads_allowed_to_block_multiplier = 5
rules_file = /opt/graylog/rules/graylog.drl
http_connect_timeout = 30s
http_read_timeout = 30s
http_write_timeout = 30s
dashboard_widget_default_cache_time = 10s
adrianlyjak commented 8 years ago

+1

kroepke commented 8 years ago

Currently the only real workaround is to increase the default timeout in the web-interface.conf to something higher. I believe the parameter is called timeout.DEFAULT and takes standard time values like 5s etc. I'd increase it to 15s and see where it goes. In the future we will solve this differently

gruselglatz commented 8 years ago

I can't find a parameter called timeout.DEFAULT. In which .conf file should it be? I am on 1.2.2

kroepke commented 8 years ago

It's not listed in the default config file. Put it into the web interface configuration file like: timeout.DEFAULT = 15s

gruselglatz commented 8 years ago

OK Thx, that solved the problem. i've set it to 50s

reighnman commented 8 years ago

I was running into a timeout issue with system/indecies requiring me to bump the timeout to 15s.

I have roughly 400 32gb indexes ~ 2400 shards

This has made large dashboards way less clunky in terms of random errors as well. Thanks @kroepke