microsoft / windows-container-tools

Collection of tools to improve the Windows Containers experience
MIT License
236 stars 66 forks source link

Fix SanitizeJson when string contains escaped quotes (or nested json) #173

Open bnu0 opened 1 month ago

bnu0 commented 1 month ago

SanitizeJson is currently broken for strings containing nested quoted strings (or nested json). The code attempts to check if characters are already escaped, and not escape them again, which means that the nested strings are not properly decoded and break a json lexer in a subsequent log pipeline.

This PR fixes the encoding to be unconditional, specifically:

A unit test is added which fails against the repo as-is, and is also fixed in this PR.

Example

Assume the json {"foo":"bar","nested_quotes":"this string \"contains\" quotes"} is logged from an application as a single line of text.

The current implementation of SanitizeJson returns:

"{\"foo\":\"bar\",\"nested_quotes\":\"this string \\"contains\\" quotes\"}"

GitHub's syntax highlighting shows the issue above clearly: the nested quotes are missing an extra \\ and therefore accidentally terminate the enclosing string.

After this change:

"{\"foo\":\"bar\",\"nested_quotes\":\"this string \\\"contains\\\" quotes\"}"

👌❤️

bnu0 commented 1 month ago

@microsoft-github-policy-service agree

profnandaa commented 1 month ago

@bnu0 -- thanks Benjamin for the PR. We've been considering to use a library like BoostJSON for our parsing instead of our custom approach. I fear that this code could go away after the migration. I hope it's okay by you to put this PR on hold for awhile as we finish the deliberation.