snowflakedb / snowflake-ingest-java

Java SDK for the Snowflake Ingest Service -
http://www.snowflake.net
Apache License 2.0
71 stars 57 forks source link

About channel getting invalid #585

Closed xuanziranhan closed 1 year ago

xuanziranhan commented 1 year ago

Hi friends, Sorry for asking the question here, please point me to the right place if we know.

We are using the client library to write streaming data to snowflake in aws lambda, but kept getting net.snowflake.ingest.utils.SFException: Channel INGEST.TEST_SCHEMA.TEST_QUERY_LOGS.INGEST.TEST_SCHEMA.TEST_QUERY_LOGS is invalid and might contain uncommitted rows, please consider reopening the channel to restart. at net.snowflake.ingest.streaming.internal.SnowflakeStreamingIngestChannelInternal.checkValidation(SnowflakeStreamingIngestChannelInternal.java:462) ~[snowflake-ingest-sdk-2.0.2.jar:?] at net.snowflake.ingest.streaming.internal.SnowflakeStreamingIngestChannelInternal.close(SnowflakeStreamingIngestChannelInternal.java:273) ~[snowflake-ingest-sdk-2.0.2.jar:?]

We tried to see when a channel could be invalidated, looks like we encounter registering blob failures quite often. Is this something expected, we just handle the error, or is there something we could do to reduce the chance of this error.

Thanks a lot!

sfc-gh-xhuang commented 1 year ago

Hi @xuanziranhan Do you have a support ticket open on this issue? We need a little bit more info to look into it.

xuanziranhan commented 1 year ago

Hi @xuanziranhan Do you have a support ticket open on this issue? We need a little bit more info to look into it.

Hi @sfc-gh-xhuang , thanks for replying. I don't. Where should I create the support ticket? Also I just have some questions, if I could get answer from you, that'll be great!

  1. For this error, is it expected common error, we should just create new channels and retry or we could avoid it happening often?
  2. Is there a limitation on how many channels we could create based on a single client? I read that the channel and client are thread-safe, but is that recommended that under one client, we should have one channel per table and let several threads using the channel? Or we could have several channels open?

I guess the ideal situation would be like for reading from kafka, each channel be opened for each partition, and keep the offset token. We are reading from kinesis, situation's a bit different, so asking. Thanks a lot

sfc-gh-xhuang commented 1 year ago

You should be able to open a support ticket through your Snowflake account on the UI or support portal: https://community.snowflake.com/s/article/How-To-Submit-a-Support-Case-in-Snowflake-Lodge

Can you also include which version of the SDK you are using?

  1. I don't believe this is a common error. We will need to investigate the root cause.

  2. There is not a limit on the number of total channels that can be created by a single client but do note the memory constraints of your underlying infrastructure that is running the client. However there is a limit on the number of channels that can be opened on a single table. The soft limit is 10,000 total channels per table (can be raised but most designs should not need this many channels as channels can be reused). Unused channels automatically expire after 30 days. It is recommended that a client open multiple channels for throughput or parallelism: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-streaming-recommendation

One channel can only write to one table but you can open multiple channels to the same table to handle different topics or partitions. A single client can open multiple channels to multiple tables but a channel can not write to 2 different tables nor can 2 clients use the same channel.

In our kafka connector design, we have 1 client open x number of channels based on the number of task workers which is correlated to the number of partitions.

vsrinivasan08 commented 1 year ago

@xuanziranhan, I have faced this while testing in dev and staging often. Found my channel name was unique after reading the SDK Java doc. Generating a unique channel name resolved the issue (Something like appending timestamp at the end)

xuanziranhan commented 1 year ago

@xuanziranhan, I have faced this while testing in dev and staging often. Found my channel name was unique after reading the SDK Java doc. Generating a unique channel name resolved the issue (Something like appending timestamp at the end)

We have unique channel names.