snowflakedb / spark-snowflake

Snowflake Data Source for Apache Spark.
http://www.snowflake.net
Apache License 2.0
218 stars 99 forks source link

Support micro seconds timestamp precision with copy unload #491

Open arthurli1126 opened 1 year ago

arthurli1126 commented 1 year ago

Hi folks, due to high NAT gateway cost, we have to use copy load when reading from snowflake, but currently copy unload doesn't support micro second level precision(only at mills). I can work on the PR to add it. But wondering if you have any concerns about this.

sfc-gh-mrui commented 1 year ago

@arthurli1126 Could you please try SC 2.6.0 or newer version? COPY UNLOAD is mainly used from SC 2.5.x and prior versions.

arthurli1126 commented 1 year ago

@sfc-gh-mrui thanks for your reply, the problem is with the simpleDateTime(https://github.com/snowflakedb/spark-snowflake/blob/master/src/main/scala/net/snowflake/spark/snowflake/Conversions.scala#L66) used for parse timestamp during copy unload. The format only support millisecond precision and would give wrong timestamp if the timestamp carries micro seconds. For instance, for string 2023-03-01 07:54:56.191173 it would consider it carries 191173 milliseconds so it will add 191000 / 1000 / 60 = 3 mins 11s and put 173 microseconds to milliseconds filed: 2023-03-01 07:58:07.173000.

arthurli1126 commented 1 year ago

Created a draft PR for this: https://github.com/snowflakedb/spark-snowflake/pull/492

Please let me know if I misunderstand anything or if you have any concerns about the approach.