snowplow / snowplow-rdb-loader

Stores Snowplow enriched events in Redshift, Snowflake and Databricks
Other
31 stars 16 forks source link

Change `domain_sessionid` to be a varchar instead of a `char` #1324

Open rlh1994 opened 9 months ago

rlh1994 commented 9 months ago

Currently for redshift and snowflake the domain_sessionid column is loaded as a char(128) when most other columns are a varchar(128). In snowflake this doesn't actually matter as snowflake does not pad the string (https://docs.snowflake.com/en/sql-reference/data-types-text#char-character-nchar) but in redshift this padding does occur, which can lead to confusion from users and edge case issues with data models, it also has a slightly higher storage cost when compared to a varchar.

Given no other columns are a char I don't foresee a reason that this needs to remain as one.

This is everywhere I can see the definition made: https://github.com/snowplow/snowplow-rdb-loader/blob/05dd5ab11360d52196605d83d724af16b57a95f3/modules/redshift-loader/src/main/scala/com/snowplowanalytics/snowplow/loader/redshift/RedshiftEventsTable.scala#L134

https://github.com/snowplow/snowplow-rdb-loader/blob/05dd5ab11360d52196605d83d724af16b57a95f3/modules/snowflake-loader/src/main/scala/com/snowplowanalytics/snowplow/loader/snowflake/SnowflakeEventsTable.scala#L134

https://github.com/snowplow/snowplow-rdb-loader/blob/05dd5ab11360d52196605d83d724af16b57a95f3/modules/snowflake-loader/src/main/resources/atomic-def.sql#L159

https://github.com/snowplow/snowplow-rdb-loader/blob/05dd5ab11360d52196605d83d724af16b57a95f3/modules/snowflake-loader/src/main/scala/com/snowplowanalytics/snowplow/loader/snowflake/ast/AtomicDef.scala#L193