snowplow / snowbridge

For replicating streams across clouds, accounts and regions
Other
15 stars 7 forks source link

Provide better safety for configuration #128

Open colmsnowplow opened 2 years ago

colmsnowplow commented 2 years ago

Some of our transformations seem easy to misconfigure, in ways that we can't really validate for. Eg. if you provide a typo in an event_name, the enriched filters will simply treat all those events as not matched.

We also can't log the values due to user data privacy concerns in the intended deployment model...

It would be great to think of some creative solutions for this. One thing that I think could work well (and would be v useful to scripting transformations) is to provide a harness to test your config against certain values locally/without processing all the way through. Like a 'transformation config test harness'... But I'm not tied to that solution, I'd love to hear ideas on the topic.

(relevant discussion: https://github.com/snowplow-devops/stream-replicator/pull/125#discussion_r886556282)

jbeemster commented 2 years ago

How about a way to observe whether the filters do anything? So if you event_name = x but after a configured time the filter is never activated to log a warning?

colmsnowplow commented 2 years ago

I guess the case I'm worried about here is a filter configured to event_name = pagee_view -> The filter will be active, but the typo will cause it to filter out all legitimate page views, but this will be invisible until someone notices the problem later.

jbeemster commented 2 years ago

I mean at some level the user has to be responsible for configuration input right? You also cannot control what a user calls the event (they could make one called pagee_view that actually works).

Could you maybe add logging periodically of what the filter inputs did so a user can more easily figure out that they messed up?

colmsnowplow commented 2 years ago

Yeah that's fair - could be over-egging. For custom scripting transformations I think providing tests would be nice anyway. But yes for this, perhaps periodic logs are indeed good enough