snowplow / snowplow-python-analytics-sdk

Python SDK for working with Snowplow enriched events in Spark, AWS Lambda et al.
21 stars 9 forks source link

Add state to run manifests #29

Open chuwy opened 7 years ago

chuwy commented 7 years ago

Currently we have only RunId key in run manifest table - enriched-archive/run=2017-05-01-12-00-00. It would be useful to add State with STARTED and COMPLETED to see what folders were failed.

alexanderdean commented 7 years ago

SUCCEEDED not COMPLETED for clarity...

chuwy commented 7 years ago

I actually think instead of State key it should be separate Started key with date and Suceeded with date/null. This way we can get additional information on when folders were processed.

Edit: Suceeded

alexanderdean commented 7 years ago

StartedAt, SucceededAt - I think that works...

chuwy commented 7 years ago

@alexanderdean do you think this feature can go outside Analytics SDK? Like maintaining state LoadedAt from RDB Loader or AddedAt from EmrEtlRunner (or whatever we'll be responsible in that moment).

alexanderdean commented 7 years ago

I am not sure if this answers your question @chuwy, but I 100% envisage the Scala Analytics SDK being used in e.g. the RDB Loader, and thus yes using this state tracking...