opensearch-project / data-prepper

Data Prepper is a component of the OpenSearch project that accepts, filters, transforms, enriches, and routes data at scale.
https://opensearch.org/docs/latest/clients/data-prepper/index/
Apache License 2.0
260 stars 194 forks source link

[BUG] parquet records are not completely ingested into the open search severless sink #3856

Open jw-amazon opened 10 months ago

jw-amazon commented 10 months ago

Describe the bug I am seeing parquet records are not completely ingested into the open search severless sink sometimes.

To Reproduce Steps to reproduce the behavior:

  1. Go to AWS console
  2. Click on Open Search Ingestion Pipeline
  3. Check the document metrics and no documents failed to ingest and dlq is empty, however, I still see some parquet records were not completely ingested.

Expected behavior The parquet record read count need to be the same with document write count. Would also like to see a metrics to reflect how many parquet records are ingested, so I can be confident that all records have been successfully read.

Screenshots

Environment (please complete the following information):

Additional context Opened a internal ticket as well, will link this issue to the internal ticket.

dlvenable commented 9 months ago

@jw-amazon , Do you have any samples of the counts that you are seeing? Also, which metrics specifically are you comparing?

jw-amazon commented 9 months ago

Hello, @dlvenable , I created this issue based on an internal ticket I created, I will ping you the internal ticket. This issue happened multiple times to us.