Open fruboes opened 6 years ago
@fruboes - Is there any pattern to the when one stops ? Does it ever recover, given enough time ? Could the load from Logstash be overwhelming Elasticsearch's bulk queue
Error is somewhat misleading.
I can reproduce this issue. OS: debian 8.10. logstash.version 6.2.4 elasticsearch version 6.2.4
3 node setup. 1 master and ingest roles, 2 data and ingest roles. Bulk queue during run = 0, bulk rejected = 0 on all nodes. Only 1 instance of logstash running with pipeline.workers: 1 on master node. No other load on an elastic cluster. Using JDBC as input, elasticsearch as output. Always stops after 1055076 docs imported even if using different nodes in elasticsearch output plugin. Never recovers. Nothing worrying in the elasticsearch logs.
If I use JDBC enable paging parameter in input plugin I will get same error much later, after aproximately 16 mil docs indexed.
I narrowed problem to a single string field in a document with size more than 100MB.
I hope this info will be helpful.
This looks like potentially the same issue I'm seeing on #793.
OS
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.1 LTS"
NAME="Ubuntu"
VERSION="18.04.1 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.1 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
# free -h
total used free shared buff/cache available
Mem: 31G 17G 10G 1.1M 2.8G 13G
Swap: 0B 0B 0B
ELK
#logstash --version
logstash 6.4.2
#elasticsearch --version
Version: 6.4.2, Build: default/deb/04711c2/2018-09-26T13:34:09.098244Z, JVM: 1.8.0_181
#kibana --version
6.4.2
pipeline.yml
- pipeline.id: dnslog
path.config: "/etc/logstash/dnslog/conf.d/*.conf"
queue.drain: true
queue.type: persisted
queue.max_bytes: 200mb
pipeline.batch.size: 1000
pipeline.workers: 10
pipeline.batch.delay: 5
- pipeline.id: elastiflow
path.config: "/etc/logstash/elastiflow/conf.d/*.conf"
queue.type: memory
pipeline.workers: 10
pipeline.batch.size: 125
pipeline.batch.delay: 5
logstash.yml
# grep -v -E "#|^$" /etc/logstash/logstash.yml
path.data: /var/lib/logstash
log.level: info
path.logs: /var/log/logstash
xpack.monitoring.enabled: true
xpack.monitoring.elasticsearch.url: ["http://localhost:9200"]
elasticsearch.yml
# grep -v -E "#|^$" /etc/elasticsearch/elasticsearch.yml
node.name: elasti
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: localhost
thread_pool.bulk.queue_size: 10000
thread_pool.write.queue_size: 10000
discovery.type: single-node
logstash-plain.log
[2018-10-11T18:55:02,625][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@127.0.0.1:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-11T18:55:02,626][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-10-11T18:55:02,628][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@127.0.0.1:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-11T18:55:02,629][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-10-11T18:55:02,633][WARN ][logstash.outputs.elasticsearch] Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError] Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out {:url=>http://elastic:xxxxxx@127.0.0.1:9200/, :error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :error_class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError"}
[2018-10-11T18:55:02,633][ERROR][logstash.outputs.elasticsearch] Attempted to send a bulk request to elasticsearch' but Elasticsearch appears to be unreachable or down! {:error_message=>"Elasticsearch Unreachable: [http://elastic:xxxxxx@127.0.0.1:9200/][Manticore::SocketTimeout] Read timed out", :class=>"LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError", :will_retry_in_seconds=>64}
[2018-10-11T18:55:05,703][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@127.0.0.1:9200/, :path=>"/"}
[2018-10-11T18:55:05,708][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://elastic:xxxxxx@127.0.0.1:9200/"}
[2018-10-11T18:55:06,769][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://elastic:xxxxxx@127.0.0.1:9200/, :path=>"/"}
[2018-10-11T18:55:06,812][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://elastic:xxxxxx@127.0.0.1:9200/"}
elasticsearch.log
[2018-10-11T18:54:41,321][ERROR][o.e.a.b.TransportBulkAction] [elasti] failed to execute pipeline for a bulk request
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.ingest.PipelineExecutionService$1@1c51d8d1 on EsThreadPoolExecutor[name = elasti/write, queue capacity = 10000, org.elasticsea
rch.common.util.concurrent.EsThreadPoolExecutor@6d685369[Running, pool size = 8, active threads = 8, queued tasks = 10000, completed tasks = 31658]]
[2018-10-11T18:54:41,321][ERROR][o.e.a.b.TransportBulkAction] [elasti] failed to execute pipeline for a bulk request
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.ingest.PipelineExecutionService$1@68875ff7 on EsThreadPoolExecutor[name = elasti/write, queue capacity = 10000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6d685369[Running, pool size = 8, active threads = 8, queued tasks = 10000, completed tasks = 31658]]
I'm experiencing the same issue when i enable 2 pipelines. It's ok if only 1 pipeline running. Anyone know how to troubleshoot or fine tuning the setup feel free to comment.
I have a small setup of elasticsearch with a single node to which I'm writing using logstash with jdbc data source (paging enabled) and the aggregate filter. Since data import involves processing about 60M rows from postgresql, I'm running 10 independent logstash instances (with -w 1) working on different data. All logstash instances run on the same machine. After some time one of logstash instances starts to give the following errors:
From this moment _node/stats/pipelines for faulty instance shows constant event out count. Other instances keep writing data successfully, i.e. doc count in elastic grows. At the end some logstash instances finish properly and some end in the state described above.
Any idea what could be wrong here? BTW I don't see anything worrying in the elasticsearch logs.