snowplow / snowplow-python-analytics-sdk

Python SDK for working with Snowplow enriched events in Spark, AWS Lambda et al.
21 stars 9 forks source link

Add option to list glacierefied folders #30

Open chuwy opened 7 years ago

chuwy commented 7 years ago

In list_runids function we explicitly skip run ids that are archived on AWS Glacier, which means they will never appear in run manifest. I believe this is wrong solution as customer can restore particular folder without intention to reprocess it (let's say with PySpark job).

I propose to:

This should make run manifests feature able to take full control over data processing.

TODO: Think what to do with another storage classes. Currently we list folders that have only STANDARD class.

alexanderdean commented 7 years ago

I think this makes sense. I would probably call the state IgnoredAt, rather than CancelledAt?

chuwy commented 7 years ago

Agree about IgnoredAt.