snowplow / enrich

Snowplow Enrichment jobs and library
https://snowplowanalytics.com
Other
21 stars 38 forks source link

Domain_sessionid UUID validation #808

Open althael opened 1 year ago

althael commented 1 year ago

Hello, a question about the domain_sessionid UUID validation in the enrich code which created some confusion despite the fact that its described as a text type value in the documentation and the json schema itself.

json schema: https://github.com/snowplow/iglu-central/blob/master/schemas/com.snowplowanalytics.snowplow/atomic/jsonschema/1-0-0 Screenshot 2023-07-27 at 4 04 27 PM

docs: https://docs.snowplow.io/docs/understanding-your-pipeline/canonical-event/ Screenshot 2023-07-27 at 4 01 31 PM

For our specific use case we needed to pass a different format than UUID and run into this enrichment failed validation case.

{"schema":"iglu:com.snowplowanalytics.snowplow.badrows/enrichment_failures/jsonschema/2-0-0","data":{"processor":{"artifact":"snowplow-stream-enrich","version":"3.7.0"},"failure":{"timestamp":"2023-07-27T08:39:55.550680Z","messages":[{"enrichment":null,"message":{"field":"sid","value":"1234567891234","expectation":"not a valid UUID"}}]}

code reference: https://github.com/snowplow/enrich/blame/f882a7d324e655c5b67b1e687c470becc3bf6dd6/modules/common/src/main/scala/com.snowplowanalytics.snowplow.enrich/common/enrichments/Transform.scala#L61

If this validation is intended It could be more helpful to be reflected in the documents in some way for better understanding that a UUID format validation is enforced for this specific value.

Thanks

miike commented 1 year ago

Hi @althael this is intended to be a UUID and effectively readable but not writable so I'll ensure we get the documentation updated. If you wish to override values in the tracking protocol the best thing to do is to add this to an entity in the event itself.