"Marking url as dead" infinite loop in logstash while elasticsearch connectivity is OK

fruboes commented 6 years ago

I have a small setup of elasticsearch with a single node to which I'm writing using logstash with jdbc data source (paging enabled) and the aggregate filter. Since data import involves processing about 60M rows from postgresql, I'm running 10 independent logstash instances (with -w 1) working on different data. All logstash instances run on the same machine. After some time one of logstash instances starts to give the following errors:

[2018-01-31T13:24:43,545][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/][Manticore::SocketException] Connection reset {:url=>http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/, :error_message=>"Elasticsearch Unreachable: [http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/][Manticore::SocketException] Connection reset", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-01-31T13:24:43,569][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/][Manticore::SocketException] Connection reset", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>2}
[2018-01-31T13:24:43,571][DEBUG][logstash.outputs.elasticsearch] Failed actions for last bad bulk request! {:actions=>[["index", {:_id=>nil (line truncated for brevity)
[2018-01-31T13:24:46,186][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/, :path=>"/"}
[2018-01-31T13:24:46,193][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/"}
[2018-01-31T13:25:02,342][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/][Manticore::SocketException] Connection reset {:url=>http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/, :error_message=>"Elasticsearch Unreachable: [http://XXXXXXX:xxxxxx@YYYYYYYYYYYY:9200/][Manticore::SocketException] Connection reset", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
(above lines repeated indefinitely)

From this moment _node/stats/pipelines for faulty instance shows constant event out count. Other instances keep writing data successfully, i.e. doc count in elastic grows. At the end some logstash instances finish properly and some end in the state described above.

Version: logstash-6.1.3
Operating System: centos7

Any idea what could be wrong here? BTW I don't see anything worrying in the elasticsearch logs.

jakelandis commented 6 years ago

@fruboes - Is there any pattern to the when one stops ? Does it ever recover, given enough time ? Could the load from Logstash be overwhelming Elasticsearch's bulk queue

dkozhuhar commented 6 years ago

Error is somewhat misleading.

I can reproduce this issue. OS: debian 8.10. logstash.version 6.2.4 elasticsearch version 6.2.4

3 node setup. 1 master and ingest roles, 2 data and ingest roles. Bulk queue during run = 0, bulk rejected = 0 on all nodes. Only 1 instance of logstash running with pipeline.workers: 1 on master node. No other load on an elastic cluster. Using JDBC as input, elasticsearch as output. Always stops after 1055076 docs imported even if using different nodes in elasticsearch output plugin. Never recovers. Nothing worrying in the elasticsearch logs.

If I use JDBC enable paging parameter in input plugin I will get same error much later, after aproximately 16 mil docs indexed.

dkozhuhar commented 6 years ago

I narrowed problem to a single string field in a document with size more than 100MB.

I hope this info will be helpful.

nickpeirson commented 6 years ago

This looks like potentially the same issue I'm seeing on #793.

umarizulkifli commented 6 years ago

OS

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

# free -h
              total        used        free      shared  buff/cache   available
Mem:            31G         17G         10G        1.1M        2.8G         13G
Swap:            0B          0B          0B

ELK

#logstash --version
logstash 6.4.2
#elasticsearch --version
Version: 6.4.2, Build: default/deb/04711c2/2018-09-26T13:34:09.098244Z, JVM: 1.8.0_181
#kibana --version
6.4.2

pipeline.yml

- pipeline.id: dnslog
  path.config: "/etc/logstash/dnslog/conf.d/*.conf"
  queue.drain: true
  queue.type: persisted
  queue.max_bytes: 200mb
  pipeline.batch.size: 1000
  pipeline.workers: 10
  pipeline.batch.delay: 5

- pipeline.id: elastiflow
  path.config: "/etc/logstash/elastiflow/conf.d/*.conf"
  queue.type: memory
  pipeline.workers: 10
  pipeline.batch.size: 125
  pipeline.batch.delay: 5

logstash.yml

# grep -v -E "#|^$" /etc/logstash/logstash.yml
path.data: /var/lib/logstash
log.level: info
path.logs: /var/log/logstash
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.url: ["http://localhost:9200"]

elasticsearch.yml

# grep -v -E "#|^$" /etc/elasticsearch/elasticsearch.yml
node.name: elasti
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: localhost
thread_pool.bulk.queue_size: 10000
thread_pool.write.queue_size: 10000
discovery.type: single-node

logstash-plain.log

[2018-10-11T18:55:02,625][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@127.0.0.1:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-11T18:55:02,626][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-10-11T18:55:02,628][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@127.0.0.1:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-11T18:55:02,629][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-10-11T18:55:02,633][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@127.0.0.1:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-11T18:55:02,633][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-10-11T18:55:05,703][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@127.0.0.1:9200/, :path=>"/"}
[2018-10-11T18:55:05,708][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://elastic:xxxxxx@127.0.0.1:9200/"}
[2018-10-11T18:55:06,769][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@127.0.0.1:9200/, :path=>"/"}
[2018-10-11T18:55:06,812][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://elastic:xxxxxx@127.0.0.1:9200/"}

elasticsearch.log

[2018-10-11T18:54:41,321][ERROR][o.e.a.b.TransportBulkAction] [elasti] failed to execute pipeline for a bulk request
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.ingest.PipelineExecutionService$1@1c51d8d1 on EsThreadPoolExecutor[name = elasti/write, queue capacity = 10000, org.elasticsea
rch.common.util.concurrent.EsThreadPoolExecutor@6d685369[Running, pool size = 8, active threads = 8, queued tasks = 10000, completed tasks = 31658]]
[2018-10-11T18:54:41,321][ERROR][o.e.a.b.TransportBulkAction] [elasti] failed to execute pipeline for a bulk request
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.ingest.PipelineExecutionService$1@68875ff7 on EsThreadPoolExecutor[name = elasti/write, queue capacity = 10000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6d685369[Running, pool size = 8, active threads = 8, queued tasks = 10000, completed tasks = 31658]]

I'm experiencing the same issue when i enable 2 pipelines. It's ok if only 1 pipeline running. Anyone know how to troubleshoot or fine tuning the setup feel free to comment.

logstash-plugins / logstash-output-elasticsearch

"Marking url as dead" infinite loop in logstash while elasticsearch connectivity is OK #729