outrauk / dataiku-plugin-snowflake-hdfs

DSS plugin for fast loading between Snowflake and HDFS
https://github.com/outrauk/dataiku-plugin-snowflake-hdfs
MIT License
0 stars 0 forks source link

Snowflake to HDFS syncs integers as decimals #20

Closed mklaber closed 4 years ago

mklaber commented 4 years ago

If a table in Snowflake that stores integers has a type of NUMBER, it gets output to Parquet as decimal.

Snowflake tables with int or bigint columns get stored in SF a NUMBER, which then outputs to Parquet as int65 (DECIMAL(18,0)) and read back in to DSS as decimal.

message schema {
  optional binary misc (UTF8);
  optional binary event_type (UTF8);
  optional binary source (UTF8);
  optional int64 uprn (DECIMAL(18,0));
  optional int64 event_date (TIMESTAMP_MILLIS);
  optional int64 virtual_load_date (TIMESTAMP_MILLIS);
}

Fix: explicitly cast as bigint:

 COPY INTO '@PUBLIC.STAGE/event_hdfs/part'
 FROM (
     SELECT "misc", "event_type", "source", "uprn":bigint AS "uprn", "event_date", "virtual_load_date"
     FROM "DSSMANAGED"."EVENT_DATA_events_sf_mk"
 )
 FILE_FORMAT = (TYPE = PARQUET)
 OVERWRITE = TRUE
 HEADER = TRUE;

Also in AOBS-564