Open rowolff opened 1 month ago
This is most likely a change between 2.x and 3.x of moving from akka-http to http4s (which is far stricter compared to akka which is relatively lax most of the time).
Unfortunately the behaviour is undesired but I think it is likely correct (having it work with changing ordering is unusual though) as the backslash value in your cookie is forbidden in the standard spec as per RFC 6265 where cookie octet must be
cookie-octet = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
; US-ASCII characters excluding CTLs,
; whitespace DQUOTE, comma, semicolon,
; and backslash
As a result the recommendation (for cross browser compatibility) is to base64 encode anything where you expected disallowed characters to occur.
To maximize compatibility with user agents, servers that wish to
store arbitrary data in a cookie-value SHOULD encode that data, for
example, using Base64 [RFC4648].
Hi @miike - thank you for the quick response and the awesome detective work. I'll check with my team if and what we can do about it. Some JSON strings come from 3rd party tools and we're not in control of how they are formatted, so it might take some time to resolve that.
No worries. I can see how third party cookies could definitely be problematic and difficult to modify (or get encoded correctly).
There may be some good news in that it looks like this is by no means the first time folks have run into this issue with http4s and as a result there is a PR that adds a "RelaxedCookies" mode - and the test seem to include some JSON. I haven't tested this as I'm assuming it's an issue with the collector rather than enrich - but that seems a reasonable bet if the same version of enrich demonstrates different behaviour between 2.9.1 and 3.2.0.
I've raised this with the engineering team to have a closer look and see what we might be able to do - thank you for flagging this one!
Thanks a lot for providing all the details and the scripts @rowolff !
I compared the difference in the outputs between Collector 3.2.0 and 2.10.0, here are the results :
Input : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};
Output : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};
Input : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
Output : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
Input : Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":\"Value\"};
Output : Cookie: wanted_cookie=crucial_value
Input : Cookie: AS_JSON={\"Key\":\"Value\"}; wanted_cookie=crucial_value;
Output : Cookie: wanted_cookie=crucial_value
Collector 2.x
was removing the JSON from the cookie, probably because it was not respecting the RFC. We'll see internally if we put this behavior back in 3.x
or if we update Enrich to work with the JSON, we'll let you know!
Project: Stream Enrich
Version: 5.0.0
Expected behavior:
AS_JSON={\"Key\":"Value"}; wanted_cookie=crucial_value;
wanted_cookie
is configured in the cookie extractor enrichmentActual behavior:
The extraction only works if there's no stringified JSON in front of the wanted cookie:
Cookie: wanted_cookie=crucial_value; AS_JSON={\"Key\":"Value"};
Cookie: AS_JSON={\"Key\":"Value"}; wanted_cookie=crucial_value;
Steps to reproduce:
Example: I reproduced this with Snowplow Micro in this repository: https://github.com/rowolff/snowplow-micro-debugging/
Additional info:
We noticed the bug while upgrading our components. We were running with Collector 2.9.1/Enrich 5.0.0 for a while and then jumped to Collector 3.2.0/Enrich 5.0.0 when we suddenly noticed the issue. Hope this helps.