snowflakedb / snowflake-ingest-java

Java SDK for the Snowflake Ingest Service -
http://www.snowflake.net
Apache License 2.0
71 stars 57 forks source link

Does not validate by stripping `\u0000` from json #870

Open jdcaperon opened 1 month ago

jdcaperon commented 1 month ago

The client does not validate VARIANT values by stripping the null char, this means that it's possible to submit values that are not able to parsed by Snowflake. See support case 00872717

https://github.com/snowflakedb/snowflake-ingest-java/blob/a9ed16df1e308c1d1cc07b9c3bdcd573b455d904/src/main/java/net/snowflake/ingest/streaming/internal/DataValidationUtil.java#L89-L124

This is pretty bad, because it means that tables can get into a state where they are unqueryable when selecting the damaged VARIANT column.

jdcaperon commented 1 month ago

Potentially we can have a similar system to https://community.snowflake.com/s/article/How-to-load-Data-into-snowflake-removing-multiple-invalid-UTF-8-Characters-without-doing-a-seperate-transformation-step

sfc-gh-xhuang commented 1 week ago

@jdcaperon we have this filed on our backlog to resolve. have you stripped the values from your input as a workaround in the meantime?

jdcaperon commented 1 week ago

@sfc-gh-xhuang yes we are doing our own prevention for this issue in the meantime