snowplow / snowplow-rdb-loader

Stores Snowplow enriched events in Redshift, Snowflake and Databricks
Other
31 stars 17 forks source link

Common: postgres support for shredded events #47

Closed chuwy closed 3 years ago

chuwy commented 7 years ago

Migrated from https://github.com/snowplow/snowplow/issues/3204

chuwy commented 5 years ago

Some great questions @aldemirenes raised that we need to answer before following up on #155:

  1. Should we follow Redshift batch model for postgres loading? INSERTs are not very slow in Postgres, so we assume it will be streaming only
  2. If this will be streaming only, what sources should it support? Having that we see Snowplow Mini as a main Postgres use case, NSQ can be a good choice. Kafka is another option we need to consider.
  3. We always assumed that Postgres needs to follow Redshift table structure with multiple tables, but in fact Postgres has JSON support, which would make it very Snowflake-like
maximillianus commented 5 years ago

Hi @chuwy ,

I am trying to implement YAUAA enrichment (https://github.com/snowplow/snowplow/wiki/YAUAA-enrichment) in my snowplow pipeline.

I am using PosgreSQL as my storage. Can I load the shredded-types data into my postgresql by following this guide ? Or currently snowplow does not have capability to load shredded-types to postgresql and hence this issue is opened?

chuwy commented 5 years ago

Hi @maximillianus. No, unfortunately, shredded types in Postgres are not supported yet. This is what this ticket is about to bring.

maximillianus commented 5 years ago

Hey @chuwy thanks for a very fast response!

Is there any suggested workaround if we want to parse the JSONs in the shredded-types into table? I am trying to avoid to do it manually.

chuwy commented 5 years ago

@maximillianus no, unfortunately, none that I'm aware of. There's some work has been done on shredding contexts and self-describing events into TSV, but still there's a lot of things remaining to be done (table creation and migration) and I don't think that one without Scala knowledge can even make it compile.

You might want to ask our community: http://discourse.snowplowanalytics.com/, maybe someone have found a workaround (such as JSONB columns), but I'm not aware of such workarounds.

chuwy commented 3 years ago

Postgres is handled by dedicated Loader: https://github.com/snowplow-incubator/snowplow-postgres-loader