snowplow / iglu-central

Contains all JSON Schemas, Avros and Thrifts for Iglu Central
http://iglucentral.com
Apache License 2.0
118 stars 114 forks source link

Release R146 #1320

Closed igneel64 closed 1 year ago

igneel64 commented 1 year ago

Fixes validation errors in Sendgrid webhooks with the following changes:

Because of the changes in the maximum value of asm_group_id, these schemas need to update the major version to 3-0-0.

istreeter commented 1 year ago

Looks good! Happy to merge this as-is.

But I'm curious -- why did you pick 45 and 320 as the max string lengths? I looked up some IPv6 addresses and it seems 39 characters is the maximum. Is there a longer IP address type I don't know about, or are you allowing extra characters for bit of safety?

You mentioned that Sendgrid occasionally sends uknown in the IP address field. Did you consider changing the sendgrid adaptor so it removes unknown and replaces it with null? In other words, clean the data earlier in the pipeline, instead of later in the warehouse.

miike commented 1 year ago

@istreeter The max length of an IPv6 address is 39 but it's 45 for a IPv4-mapped IPv6 address (effectively extra bits for the IPv4 address on the end). I believe 320 is from one of the email RFCs.

Good point about unknown and null. I'm not sure if Sendgrid means something different here by unknown as it doesn't seem to be well documented.

istreeter commented 1 year ago

You might be right, but the examples I've seen of IPv4-mapped IPv6 addresses are still 39 characters. I played around a bit with this tool: https://iplocation.io/ipv4-to-ipv6

miike commented 1 year ago

Yeah - this is confusing because from a storage point of view it's typically 39 characters but then different tools have different display conventions (in essence to make the IPv4 address more human readable).

So a displayed IPv4 mapped address 0000:0000:0000:0000:0000:ffff:255.255.255.255 is equivalent to the expanded IPv6 0000:0000:0000:0000:0000:ffff:ffff:ffff.