Prevent invalidation of channels on client that are not themselves invalid

jdcaperon commented 8 months ago

Currently, when a channel is invalidated on a client for any reason, all channels on the client are invalidated or at least all of those in the same chunk. This is quite painful when working with clients that process many channels and when automatic balancing across clients occurs. Any time a channel is rebalanced another client must be fully restarted to begin processing rows again. When using short lived clients for processing on things like spot instances or spiky workloads, the constant rebalancing and invalidations can cause frequent disruptions.

https://github.com/snowflakedb/snowflake-ingest-java/blob/a69e6f40f3ad2ed808098cb3186bec551f266449/src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestResponseCode.java#L58-L61

For example, this can be a common occurrence in rebalancing if there is no "handoff" procedure. Such an error can lead to all channels on a client being invalidated since.

https://github.com/snowflakedb/snowflake-ingest-java/blob/a69e6f40f3ad2ed808098cb3186bec551f266449/src/main/java/net/snowflake/ingest/streaming/internal/StreamingIngestResponseCode.java#L52-L55

In an ideal world, when a single channel is invalidated, that invalidation does not affect other channels on the client. This allows channels to continue processing without having to backtrack to a previous committed offset. Is this possible, or maybe already on the roadmap?

sfc-gh-lsembera commented 8 months ago

Hi @jdcaperon, when a channel is invalidated, the SDK doesn't invalidate all channels in the client, it only invalidates those channels, which had some data collocated in the same chunk with the invalidated channel. Let's say data for channels A, B and C were flushed into the same chunk. If Channel B is reopened while the chunk is being uploaded to Snowflake, we have to discard the whole chunk because at least some data (i.e. data for channel B) is not valid anymore. Once channels are flushed into a chunk, we can no longer separate data for each individual channel, so we cannot selectively only discard data for channel B.

If you need to move a channel from client A to client B, I'd recommend you to stop ingesting into the channel on client A, wait until all in-flight data for that channel is committed and only then reopen the channel on client B. This way you guarantee the channel won't get forcibly invalidated with status code 20 in client A, and cause collateral invalidation of other channels.

jdcaperon commented 8 months ago

Makes sense. In my case moving channels between clients is quite difficult and requires a bit more overhead, but one day I will attempt to support it!

sfc-gh-lsembera commented 8 months ago

@jdcaperon Can we close this issue?

snowflakedb / snowflake-ingest-java

Prevent invalidation of channels on client that are not themselves invalid #678