Currently we ingest a number of relatively loosely typed events using this target, intending to coalesce the types appropriately in the data warehouse layer, e.g using coalesce-fields from DBT (https://github.com/fishtown-analytics/stitch-utils).
This is possible when using target-redshift or an ETL service using it such as Stitch, however it doesn't work with this tap as the number of columns grows infinitely if the type of the column keeps changing.
For example, in our data warehouse raw tables which are imported using this target, we currently see:
due to the source data flip-flopping between types. We're aware it would be beneficial to edit the data on the source, however our aim is to coalesce these columns downstream, rather than in our loading process, and fixing it in the source wouldn't prevent new problems arising in new events in future.
We'd benefit here from the approach of target-redshift of not versioning the columns, instead just adding the column datatype as a suffix.
Related to #19 as defining a type (varchar or string normally in these cases) would offer an alternative.
Currently we ingest a number of relatively loosely typed events using this target, intending to coalesce the types appropriately in the data warehouse layer, e.g using
coalesce-fields
from DBT (https://github.com/fishtown-analytics/stitch-utils).This is possible when using target-redshift or an ETL service using it such as Stitch, however it doesn't work with this tap as the number of columns grows infinitely if the type of the column keeps changing.
For example, in our data warehouse
raw
tables which are imported using this target, we currently see:due to the source data flip-flopping between types. We're aware it would be beneficial to edit the data on the source, however our aim is to coalesce these columns downstream, rather than in our loading process, and fixing it in the source wouldn't prevent new problems arising in new events in future.
We'd benefit here from the approach of target-redshift of not versioning the columns, instead just adding the column datatype as a suffix.
Related to #19 as defining a type (varchar or string normally in these cases) would offer an alternative.