A fully incremental model, that transforms raw web & mobile event data generated by the Snowplow JavaScript & mobile trackers into a series of derived tables of varying levels of aggregation.
I think it would be worth to take a quick look at the current progress and get some feedback before I continue to make changes on sessions and users as the main logic of unification happens in base and views already so any changes I would have to correct further up as well.
I have added a dummy one line mobile data in the integration tests so that we can see if it actually runs as expected. The integration tests are currently only running to see if they compile and only until derived.views. For Snowflake/Bigquery/Databricks I checked that the mobile line is returned with the extra context fields but it needs further investigation as I progress, although the original target is to only support Snowflake with the first release. For Postgres/Redshift the base macro needed a bit of tweaking to avoid duplication coming from using the same session context for both the user and session identifiers (update: it works now).
What I want to agree on mainly is the logic of unification and which contexts to use. Upon discussion with Matus it turned out that we have started moving towards removing the atomic hardcoded fields, which will be a gradual progress, some users can already use other contexts (e.g. browser context) to replace certain fields etc. I think if we try and make the fields more 'conceptual' then when this happens we can just add a coalesce and keep the integrity. However, to avoid having a 150+ wide tables, the context fields are planned to only be optional, based on what the user selects which is where it may get tricky to get the right balance of unification. I also updated the lists of fields to extract to a more recent schema version, I think this needs double checking still.
Yml files, docs, intro etc are incomplete as of now, please ignore them (as well as optional modules).
To do(things to consider later on in a future PR, depending on time before the release or after):
Thanks for all the comments so far! I am now actively working on sessions (discovering more bugs etc) but keeping this PR open for a little while so that you can see the replies but will close/merge this soon.
I think it would be worth to take a quick look at the current progress and get some feedback before I continue to make changes on sessions and users as the main logic of unification happens in base and views already so any changes I would have to correct further up as well.
I have added a dummy one line mobile data in the integration tests so that we can see if it actually runs as expected. The integration tests are currently only running to see if they compile and only until derived.views. For Snowflake/Bigquery/Databricks I checked that the mobile line is returned with the extra context fields but it needs further investigation as I progress, although the original target is to only support Snowflake with the first release. For Postgres/Redshift the base macro needed a bit of tweaking to avoid duplication coming from using the same session context for both the user and session identifiers (update: it works now).
What I want to agree on mainly is the logic of unification and which contexts to use. Upon discussion with Matus it turned out that we have started moving towards removing the atomic hardcoded fields, which will be a gradual progress, some users can already use other contexts (e.g. browser context) to replace certain fields etc. I think if we try and make the fields more 'conceptual' then when this happens we can just add a coalesce and keep the integrity. However, to avoid having a 150+ wide tables, the context fields are planned to only be optional, based on what the user selects which is where it may get tricky to get the right balance of unification. I also updated the lists of fields to extract to a more recent schema version, I think this needs double checking still.
Yml files, docs, intro etc are incomplete as of now, please ignore them (as well as optional modules).
To do (things to consider later on in a future PR, depending on time before the release or after):