Open chuwy opened 4 years ago
Migrated from https://github.com/snowplow/snowplow/issues/4244 (comments are auto-generated)
I've created a spreadsheet, proposing what new contexts and events should look like: https://docs.google.com/spreadsheets/d/1UaXrH92IvRWyXNU8wUQ-oxvEI9kJxoxbIcbRjna7RAI/edit#gid=0
@chuwy do you have enrichments config for full atomic schema?
Hi @BioQwer , which config are you refering to ? FYI this issue is still on our roadmap but this has not been prioritized yet.
I work with Open Source version. I have many empty values in atomic columns
In order to refactor atomic events we need to extract all non-generic information from a fat table into dedicated contexts and preserve only common properties. As a first step, we can have those properties in atomic event (as we do now, to not break data models) and in their deciated tables/columns (to start writing new data models).
I tried to summarize what contexts and event-specific properties can be extracted out of
Event
:app_id
platform
etl_tstamp
collector_tstamp
dvce_created_tstamp
event
event_id
txn_id
name_tracker
v_tracker
v_collector
v_etl
user_id
user_ipaddress
user_fingerprint
domain_userid
domain_sessionidx
network_userid
geo_country
- MaxMind contextgeo_region
- MaxMind contextgeo_city
- MaxMind contextgeo_zipcode
- MaxMind contextgeo_latitude
- MaxMind contextgeo_longitude
- MaxMind contextgeo_region_name
- MaxMind contextip_isp
- MaxMind contextip_organization
- MaxMind contextip_domain
- MaxMind contextip_netspeed
- MaxMind contextpage_url
- Web page context (source of truth)page_title
- Web page context (source of truth)page_referrer
- Referrer context (source of truth)page_urlscheme
- Web page contextpage_urlhost
- Web page contextpage_urlport
- Web page contextpage_urlpath
- Web page contextpage_urlquery
- Web page contextpage_urlfragment
- Web page contextrefr_urlscheme
- Referrer contextrefr_urlhost
- Referrer contextrefr_urlport
- Referrer contextrefr_urlpath
- Referrer contextrefr_urlquery
- Referrer contextrefr_urlfragment
- Referrer contextrefr_medium
- Referrer contextrefr_source
- Referrer contextrefr_term
- Referrer contextmkt_medium
- Marketing campaign contextmkt_source
- Marketing campaign contextmkt_term
- Marketing campaign contextmkt_content
- Marketing campaign contextmkt_campaign
- Marketing campaign contextcontexts
se_category
- Struct event self-describing eventse_action
- Struct event self-describing eventse_label
- Struct event self-describing eventse_property
- Struct event self-describing eventse_value
- Struct event self-describing eventunstruct_event
tr_orderid
- Ecommerce transaction self-describing eventtr_affiliation
- Ecommerce transaction self-describing eventtr_total
- Ecommerce transaction self-describing eventtr_tax
- Ecommerce transaction self-describing eventtr_shipping
- Ecommerce transaction self-describing eventtr_city
- Ecommerce transaction self-describing eventtr_state
- Ecommerce transaction self-describing eventtr_country
- Ecommerce transaction self-describing eventti_orderid
- Ecommerce transaction item contextti_sku
- Ecommerce transaction item contextti_name
- Ecommerce transaction item contextti_category
- Ecommerce transaction item contextti_price
- Ecommerce transaction item contextti_quantity
- Ecommerce transaction item contextpp_xoffset_min
- Page ping self-describing eventpp_xoffset_max
- Page ping self-describing eventpp_yoffset_min
- Page ping self-describing eventpp_yoffset_max
- Page ping self-describing eventuseragent
- Browser context (but populated from different places)br_name
- Browser context (but populated from different places) (ua-utils)br_family
- Browser context (but populated from different places) (ua-utils)br_version
- Browser context (but populated from different places) (ua-utils)br_type
- Browser context (but populated from different places) (ua-utils)br_renderengine
- Browser context (but populated from different places) (ua-utils)br_lang
- Browser context (but populated from different places)br_features_pdf
- Browser context (but populated from different places)br_features_flash
- Browser context (but populated from different places)br_features_java
- Browser context (but populated from different places)br_features_director
- Browser context (but populated from different places)br_features_quicktime
- Browser context (but populated from different places)br_features_realplayer
- Browser context (but populated from different places)br_features_windowsmedia
- Browser context (but populated from different places)br_features_gears
- Browser context (but populated from different places)br_features_silverlight
- Browser context (but populated from different places)br_cookies
- Browser context (but populated from different places)br_colordepth
- Browser context (but populated from different places)br_viewwidth
- Browser context (but populated from different places)br_viewheight
- Browser context (but populated from different places)os_name
- Browser context (but populated from different places) (ua-utils)os_family
- Browser context (but populated from different places) (ua-utils)os_manufacturer
- Browser context (but populated from different places)os_timezone
- Browser context (but populated from different places)dvce_type
- Browser context (but populated from different places) (ua-utils)dvce_ismobile
- Browser context (but populated from different places) (ua-utils)dvce_screenwidth
- Browser context (but populated from different places)dvce_screenheight
- Browser context (but populated from different places)doc_charset
- Web page (or document) contextdoc_width
- Web page (or document) contextdoc_height
- Web page (or document) contexttr_currency
- Ecommerce transaction self-describing eventtr_total_base
- Ecommerce transaction self-describing eventtr_tax_base
- Ecommerce transaction self-describing eventtr_shipping_base
- Ecommerce transaction self-describing eventti_currency
- Ecommerce transaction item contextti_price_base
- Ecommerce transaction item contextbase_currency
- Ecommerce transaction self-describing eventgeo_timezone
- MaxMind contextmkt_clickid
- Marketing campaign contextmkt_network
- Marketing campaign contextetl_tags
dvce_sent_tstamp
refr_domain_userid
- Referrer contextrefr_dvce_tstamp
- Referrer contextderived_contexts
domain_sessionid
derived_tstamp
event_vendor
event_name
event_format
event_version
event_fingerprint
- This should remain in canonical eventtrue_tstamp
Their grouping is not very semantic, but should be based mostly on the info source, e.g. although browser/device info semantically is the same information, some of properties are passed thourgh the tracker protocol and some derived through user-agent enrichment.
Contexts
Self-describing events
Common properties
It leaves us with 31 core properties that can be set almost for all events/pipelines. Maybe some of them (user/device identification) can/should be moved into dedicated contexts.
event_id
- event identificationapp_id
- event identificationevent
- eventually will be discarded in favor of vendor/name/versiontxn_id
- event identificationevent_vendor
- event identificationevent_name
- event identificationevent_format
- event identificationevent_version
- event identificationevent_fingerprint
- event identificationplatform
- probably should be moved as welldvce_created_tstamp
- timestampsdvce_sent_tstamp
- timestampscollector_tstamp
- timestampsetl_tstamp
- timestampsderived_tstamp
- timestampstrue_tstamp
- timestampsuser_id
- user/device identificationuser_ipaddress
- user/device identificationuser_fingerprint
- user/device identificationdomain_userid
- user/device identificationdomain_sessionidx
- user/device identificationdomain_sessionid
- user/device identificationnetwork_userid
- user/device identificationname_tracker
- pipeline/auxv_tracker
- pipeline/auxv_collector
- pipeline/auxv_etl
- pipeline/auxetl_tags
- pipeline/auxunstruct_event
- payloadcontexts
- payloadderived_contexts
- payload