snowplow / snowplow-rdb-loader

Stores Snowplow enriched events in Redshift, Snowflake and Databricks
Other
31 stars 16 forks source link

RDB Loader: make consistency check folder-based #74

Open chuwy opened 6 years ago

chuwy commented 6 years ago

Currently we're checking consistency by comparing list of files between checks, but (probably) it is too strict check, because in the end we're loading data using pattern s3://shredded/good/com.acme/shredded-context/jsonschema/1-0-0/part-*, which means we don't care if particular files are available, but we only need to know that folders are still exist.

Implement in conjunction with https://github.com/snowplow/snowplow-rdb-loader/issues/68

alexanderdean commented 6 years ago
[3:50 PM] Alexander Dean: On
[3:50 PM] Alexander Dean: https://github.com/snowplow/snowplow-rdb-loader/issues/74
[3:51 PM] Alexander Dean: I am not against this, but I read this as reducing the accuracy of the consistency check
[3:51 PM] Alexander Dean: whereas the description implies it makes no difference
[3:52 PM] Anton Parkhomenko: Hm, probably you're right. We really care about files for accuracy
[3:52 PM] Alexander Dean: Let's push it back