opensearch-project / logstash-output-opensearch

A Logstash plugin that sends event data to a OpenSearch clusters and stores as an index.
https://opensearch.org/docs/latest/clients/logstash/index/
Apache License 2.0
106 stars 80 forks source link

issue while ingesting large data(above 1M) with 2.0 source of logstash[BUG] #199

Closed javeed90shaik closed 8 months ago

javeed90shaik commented 1 year ago

I was trying to ingest a CSV which is constructed programmatically, the plugin absolutely with Small data. Problem happens when ingesting large data i.e., greater than 1M. Essentially, the data will be lost after the run completes or the process gets killed by itself.

  1. use any data in the CSV
  2. Trigger the logstash run with the provided CSV, data of more than 1 Million
  3. bin/logstash --http.port 9614 --path.data data/vol14 -f config/logstash_idxclone.conf < idxclone_full.csv

Expected behavior It should load all the records, it loads some data, loses most of it. Ex: I tried to load 2M records, it loaded count=404000

Properties Enabled:

pipeline.workers: 4 pipeline.batch.size: 4000 pipeline.batch.delay: 50 pipeline.unsafe_shutdown: true

Host/Environment (please complete the following information):

Note: The same document loads full 2M records when I tried it with Opensearch Logstash 7.16.3

dblock commented 8 months ago

@javeed90shaik Did you ever debug this further? I am going to close this, but if this is still a problem we'd need to take a look at logs and errors. The plugin will queue data, so most likely it's a server-side problem not able to ingest it.