Open bnu0 opened 1 month ago
@microsoft-github-policy-service agree
@bnu0 -- thanks Benjamin for the PR. We've been considering to use a library like BoostJSON for our parsing instead of our custom approach. I fear that this code could go away after the migration. I hope it's okay by you to put this PR on hold for awhile as we finish the deliberation.
SanitizeJson
is currently broken for strings containing nested quoted strings (or nested json). The code attempts to check if characters are already escaped, and not escape them again, which means that the nested strings are not properly decoded and break a json lexer in a subsequent log pipeline.This PR fixes the encoding to be unconditional, specifically:
<cr>
will become\r
,<lf>
will become\n
,"
will become\"
, and\
will become\\
.A unit test is added which fails against the repo as-is, and is also fixed in this PR.
Example
Assume the json
{"foo":"bar","nested_quotes":"this string \"contains\" quotes"}
is logged from an application as a single line of text.The current implementation of
SanitizeJson
returns:GitHub's syntax highlighting shows the issue above clearly: the nested quotes are missing an extra
\\
and therefore accidentally terminate the enclosing string.After this change:
👌❤️