[BUG] parquet records are not completely ingested into the open search severless sink

jw-amazon commented 10 months ago

Describe the bug I am seeing parquet records are not completely ingested into the open search severless sink sometimes.

To Reproduce Steps to reproduce the behavior:

Go to AWS console
Click on Open Search Ingestion Pipeline
Check the document metrics and no documents failed to ingest and dlq is empty, however, I still see some parquet records were not completely ingested.

Expected behavior The parquet record read count need to be the same with document write count. Would also like to see a metrics to reflect how many parquet records are ingested, so I can be confident that all records have been successfully read.

Screenshots

Environment (please complete the following information):

Additional context Opened a internal ticket as well, will link this issue to the internal ticket.

dlvenable commented 9 months ago

@jw-amazon , Do you have any samples of the counts that you are seeing? Also, which metrics specifically are you comparing?

jw-amazon commented 9 months ago

Hello, @dlvenable , I created this issue based on an internal ticket I created, I will ping you the internal ticket. This issue happened multiple times to us.

opensearch-project / data-prepper

[BUG] parquet records are not completely ingested into the open search severless sink #3856