Closed gberche-orange closed 1 year ago
Thanks @gberche-orange for the report! I'll check it tomorrow and either will fix it or will get back to you with a solution.
Small note: looks like the query for jq
was catching the items that didn't have postData
. Adding | select(.postData != null)
will filter out empty values.
The actual issue is most likely related to a Chromium bug, but I haven't identified yet the direct issue in their bug tracker. To verify that the bug exists in Chrome though:
I took some sample calls that finished in the browser with {"status": 1}
, but the generated cURL command ended with:
HTTP/2 400
content-type: application/json
date: Tue, 23 May 2023 14:17:44 GMT
access-control-allow-origin: https://cloud.testkube.io
access-control-allow-credentials: true
access-control-allow-methods: GET, POST, OPTIONS
access-control-allow-headers: X-Requested-With,Content-Type
x-content-type-options: nosniff
referrer-policy: same-origin
x-cache: Error from cloudfront
via: 1.1 604f8ac78ed3ba5235c1a14794f2ac64.cloudfront.net (CloudFront)
x-amz-cf-pop: FRA56-P5
x-amz-cf-id: SvBW9Cu_iU1BnpqdR7FfGixrqFbAIlKlPSBUUD1_7eB4FztDhzzHTg==
{"type": "validation_error", "code": "invalid_payload", "detail": "Malformed request data: Failed to decompress data. Not a gzipped file (b'\\x1f\\xc2')", "attr": null}
Looks like the bug in Chromium appeared somewhere in April/May, as I've been able to successfully gunzip the payload on 4th May.
The actual issue is most likely related to a Chromium bug
I also reproduce the problem with firefox 102.11 esr
Note that gzip output will output binary characters which may randomly not correspond to a valid unicode character. This makes the problem look as it was randomly reproducing. I saw it working once a may 19th, and then failed to get it to work for countless attempts.
@rangoo94 what's your analysis w.r.t. the suggested root cause of an invalid content-type ?
Thanks @gberche-orange! I finally found the problem based on your comment.
The problem is, that jq
outputs UTF-8 string. To make it working, you have to use i.e. iconv
to convert it back:
cat test.har | jq -r '[ .[].entries[].request | select(.url | contains("posthog.com") and contains("gzip")) | select(.postData.text != null) | .postData.text ][0]' | iconv -f utf-8 -t iso8859-1 | gunzip
There are two kinds of PostHog calls though - gzipped with compression=gzip-js
, and raw base64-encoded.
If you want to read them all at once, you may run this command:
cat test.har \
| jq -r '[
.[].entries[].request | select(.url | contains("posthog.com"))
| select(.postData.text != null)
| if .url | contains("gzip-js")
then "echo " + (.postData.text | @base64) + " | base64 -d | iconv -f utf-8 -t iso8859-1 | gunzip"
else "echo " + [.postData.params | select(.[].name == "data") | .[].value][0] + " | base64 -d"
end
] | join("\n")' \
| bash
It will print all of the payloads in plain text, line by line.
Alternatively, to manipulate it back as JSON array of payloads:
cat test.har \
| jq -r '[
.[].entries[].request | select(.url | contains("posthog.com"))
| select(.postData.text != null)
| if .url | contains("gzip-js")
then "echo " + (.postData.text | @base64) + " | base64 -d | iconv -f utf-8 -t iso8859-1 | gunzip"
else "echo " + [.postData.params | select(.[].name == "data") | .[].value][0] + " | base64 -d"
end
] | join("\necho ,\n") | "echo [\n" + . + "\necho ]"' \
| bash \
| jq # or i.e. > payloads.json
@gberche-orange, does it help with your issue? 🙂
Thanks a lot @rangoo94 for your hard work on this issue and crafting this query ! I hope to get some availability to test it early next week.
Regarding the character encoding of the har, did you get the chance to test changing the content-type header into the request, and then confirm that browsers then encode the request har directly into base64 ? This would make the dashboard more compliant to specifications and would then make the inspection of the har much simpler by just decoding the base64 before uncompressing the gzip binary bytes.
thanks @rangoo94 Your current script indeed helps me inspect the data posted to posthog, here is an copy below
[
{
"token": "phc_[...]",
"distinct_id": "1886bb621384be-037023c622a1698-c575422-1fa400-1886bb62139731",
"groups": {}
},
{
"event": "$opt_in",
"properties": {
"$os": "Windows",
"$os_version": "10.0",
"$browser": "Firefox",
"$device_type": "Desktop",
"$pathname": "/tests",
"$browser_version": 102,
"$browser_language": "en-US",
"$screen_height": 1080,
"$screen_width": 1920,
"$viewport_height": 303,
"$viewport_width": 1908,
"$lib": "web",
"$lib_version": "1.57.1",
"$insert_id": "cmz28jhbbl50yxyc",
"$time": 1685434278.218,
"distinct_id": "1886bb621384be-037023c622a1698-c575422-1fa400-1886bb62139731",
"$device_id": "1886bb621384be-037023c622a1698-c575422-1fa400-1886bb62139731",
"token": "phc_[...]",
"$session_id": "1886bb62140d17-0d4d31d06e464c-c575422-1fa400-1886bb62141969",
"$window_id": "1886bb621423b7-0b8cdcea1724158-c575422-1fa400-1886bb621471b1",
"$pageview_id": "1886bb62148250-062b91ee55a1c98-c575422-1fa400-1886bb621497aa"
},
"timestamp": "2023-05-30T08:11:18.218Z"
}
]
I see no more gzip-js and Content-Type: text/plain
content sent. Should I assume the posthog js lib was updated in between, to a new version loaded from the internet ? I'm still using testkube helm chart version 1.11.220
. Did you report an upstream issue to posthog ?
Thanks again for your help !
Hi @gberche-orange, I'm happy that it helped! I didn't report it to PostHog yet - for now, I only checked their existing issues, but nobody reported any problems with that.
We didn't update posthog-js
lately. but posthog-js
have some logic to decide on the strategy:
There is a disable_compression
option too, but it may lead to unnecessary bigger transfers for users, so it would be better to avoid it.
I'm closing this ticket, as I think that we can't do anything more about it unless PostHog will decide to change its implementation.
Describe the bug
This is a follow up of https://github.com/kubeshop/testkube/issues/3609
Further trying to display the content of the posthog.com requests issued in version 1.11.220, I'm unable to decode the gzip compression of requests sent: the gzip complains about end of file or invalid format.
This reproduces both with requests saved as curl bash in firefox, as well as HAR saved directly from the browser to the filesystem on ubuntu 20.0.4
Both the HAR and curl commands strings captured by the browser contain binary characters.
I'm suspecting this comes from the http request to
app.posthog.com
now having aContent-Type: text/plain
which tells the browser to consider the posted data as UTF-8 text, whereas it should rather beapplication/gzip
https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Common_types specifies that plain text content type should not have binary data
Not being able to inspect data sent to 3rd party is likely preventing users from accepting anonymized telemetry.
If the proper
application/gzip
orapplication/octet-stream
content type was sent to the request to app.posthog.com, then likely the curl and HAR exports would use the proper binary character encoding and therefore allow gzip decompression and data inspection.https://w3c.github.io/web-performance/specs/HAR/Overview.html
The HAR would then include base64 data that would be decoded before piping into gzip
Note use of gztool to further debug/diagnose the gzip data
https://unix.stackexchange.com/a/543086/381792
Valid gzip archive tool output:
To Reproduce Capture app.posthog.com requests as curl bash and try to decode them with gunzip Capture app.posthog.com requests as HAR and try to decode them with gunzip
Expected behavior
As a testkube user in order to trust anonimized data sent during telemetry I need to be able to inspect this data in clear text
Version / Cluster
Screenshots If applicable, add CLI commands/output to help explain your problem.
Additional context Add any other context about the problem here.