snowflakedb / snowflake-ingest-java

Java SDK for the Snowflake Ingest Service -
http://www.snowflake.net
Apache License 2.0
72 stars 57 forks source link

insertRows() returns no errors when connection is lost #802

Open aronsemle opened 4 months ago

aronsemle commented 4 months ago

The customer is sending OT data to Snowflake. They want to ensure that all data makes it to Snowflake. They are using a local store & forward feature in our app, where we store the data to disk and on successfully sending, we delete the data. If the send fails, we retry.

Issue The SDK insertRows call returns no errors after forcibly killing the internet connection. It takes minutes for it to eventually start reporting failed inserts.

Expected There is a faster way to detect a connection failure so applications can guarantee data delivery

Reproduce

  1. Create a connection and start sending data using the following API call var response = channel.insertRows(rowInserts, null);
  2. Kill your internet the connection
  3. The insertRows call continues to return no errors and throw no exceptions for minutes
  4. Eventually isClosed() returns true and the calls fail

Maybe there is a workaround or I'm improperly using the SDK?

sfc-gh-tzhang commented 3 months ago

hi, this is the nature of async processing and we want to wait for a few minutes in case there is any network glitch. The rows will be queued in memory until the connection is back. What you need to do is to use getLatestCommittedOffsetToken before deleting any source data. Also see check out this generic topic about offset token.

aronsemle commented 3 months ago

Thanks @sfc-gh-tzhang I was able to implement this. That said, I think exposing a way to say "send now" OR a way to remove/control the 1 second thread that sends out the data would be helpful. In cases where you need guaranteed delivery, the current approach introduces a max of 1 second delay. That seems small, but when you're sending a lot of data it adds up.

sfc-gh-tzhang commented 3 months ago

Take a look at https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-overview#latency, basically you can control the flush internal via MAX_CLIENT_LAG parameter

aronsemle commented 3 months ago

I found this, but it looks like it's no faster than 1 second? Is that right?

sfc-gh-tzhang commented 3 months ago

That's correct, the minimum is 1s for now.