Feature request:
Escaping the forward slash (or solidus, U+002F) should be optional, not required.
Background:
My organization is moving towards logging events directly in JSON. We have found and/or created several tools that work better with the data if it is already formatted in the structured form, one that can use standard libraries to easily load the data without needing custom rules to parse it.
We prefer to use common tools, like grep, to easily search over the logs and extract events that meets our search criteria to later do further analysis and troubleshooting. Some of our logs contain file paths (Unix style as well as Windows) as well as HTTP content types (e.g. application/x-www-form-data, application/json, etc.).
Problem:
json-c defaults to escaping the forward slash during serialization. This makes our logs less intuitive to search over, since the analyst/engineer must now remember to escape their queries even though the presented data will be unescaped. This is problematic since many times the analysts/engineer is pivoting from other data they have seen, and should be able to copy-paste the search term to grep for. (i.e. "grep application/x-www-form-data app.log" makes more sense then "grep application\/x-www-form-data app.log")
Reasoning:
Reading the JSON specification and other discussions on this subject makes me conclude that my request is the best direction forward.
Escaping the forward slash, according to the JSON spec, is optional. From the EMCA-404 (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) (the specification referenced to from http://www.json.org) Section 9, "All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F." The forward slash (called here the solidus) is U+002F, outside the range of the control characters, and thus exempt for this list of must escape characters. Further evidence is found further down the section, where the spec provides an example using the forward slash. The fourth case is clearly the forward slash without the escape.
The above point is further corroborated by IETF's RFC 4627 (http://www.ietf.org/rfc/rfc4627.txt), specifying The application/json Media Type for JavaScript Object Notation (JSON). It uses the same definition for defining which characters must be escaped. It also provides a grammar for constructing JSON strings, which contains the rule "unescaped = %x20-21 / %x23-5B / %x5D-10FFFF", which includes the forward slash (%x2F).
The purpose for allowing the escaping of the forward slash at all is so that JSON can be embedded within HTML's JavaScript tags without having to alter it (see the answer to this question on StackOverflow http://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped/1580664#1580664). This is a specific use case. Systems writing JSON within this use case should support and use the escaping, but other systems need not alter the strings to conform to this added requirement from this limited environment.
Conclusion:
Always escaping the forward slash to support the limited case of if this JSON would ever appear within an HTML document's script tags creates undesired consequences. It adds an unnecessary character to the data, increasing its data storage footprint (which is easily exacerbated over many events of similar structure). It prevents intuitive searching over stored JSON log events from common character sequences (Unix file paths, HTML content types, etc.). It is better that this escaping be optional, so that can be turned on if the JSON will/may end up within an HTML document as created or off if it will not be to save space and to be consistent in escaping with other languages such as C.
liblognorm:
json-c does provide an option to not escape the forward slash. The flag JSON_C_TO_STRING_NOSLASHESCAPE has to be added to the flags parameter passed to the json_object_to_json_string_ext() function call. liblognorm does not use this function (it uses the wrapper json_object_to_json_string() function, which defaults to not setting the JSON_C_TO_STRING_NOSLASHESCAPE flag), thus forcing all forward slashes to be escaped all of the time. This option should be exposed to users of lognormalizer. According to the blog post http://blog.gerhards.net/2015/12/rsyslog-and-liblognorm-will-switch-to.html, libfastjson will replace json-c as the underlying library. libfastjson removes this option entirely. It would be good if that library forwarded the option as well, or defaults with escaping off.
Feature request: Escaping the forward slash (or solidus, U+002F) should be optional, not required.
Background: My organization is moving towards logging events directly in JSON. We have found and/or created several tools that work better with the data if it is already formatted in the structured form, one that can use standard libraries to easily load the data without needing custom rules to parse it.
We prefer to use common tools, like grep, to easily search over the logs and extract events that meets our search criteria to later do further analysis and troubleshooting. Some of our logs contain file paths (Unix style as well as Windows) as well as HTTP content types (e.g. application/x-www-form-data, application/json, etc.).
Problem: json-c defaults to escaping the forward slash during serialization. This makes our logs less intuitive to search over, since the analyst/engineer must now remember to escape their queries even though the presented data will be unescaped. This is problematic since many times the analysts/engineer is pivoting from other data they have seen, and should be able to copy-paste the search term to grep for. (i.e. "grep application/x-www-form-data app.log" makes more sense then "grep application\/x-www-form-data app.log")
Reasoning: Reading the JSON specification and other discussions on this subject makes me conclude that my request is the best direction forward.
Escaping the forward slash, according to the JSON spec, is optional. From the EMCA-404 (http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf) (the specification referenced to from http://www.json.org) Section 9, "All characters may be placed within the quotation marks except for the characters that must be escaped: quotation mark (U+0022), reverse solidus (U+005C), and the control characters U+0000 to U+001F." The forward slash (called here the solidus) is U+002F, outside the range of the control characters, and thus exempt for this list of must escape characters. Further evidence is found further down the section, where the spec provides an example using the forward slash. The fourth case is clearly the forward slash without the escape.
The above point is further corroborated by IETF's RFC 4627 (http://www.ietf.org/rfc/rfc4627.txt), specifying The application/json Media Type for JavaScript Object Notation (JSON). It uses the same definition for defining which characters must be escaped. It also provides a grammar for constructing JSON strings, which contains the rule "unescaped = %x20-21 / %x23-5B / %x5D-10FFFF", which includes the forward slash (%x2F).
The purpose for allowing the escaping of the forward slash at all is so that JSON can be embedded within HTML's JavaScript tags without having to alter it (see the answer to this question on StackOverflow http://stackoverflow.com/questions/1580647/json-why-are-forward-slashes-escaped/1580664#1580664). This is a specific use case. Systems writing JSON within this use case should support and use the escaping, but other systems need not alter the strings to conform to this added requirement from this limited environment.
Conclusion: Always escaping the forward slash to support the limited case of if this JSON would ever appear within an HTML document's script tags creates undesired consequences. It adds an unnecessary character to the data, increasing its data storage footprint (which is easily exacerbated over many events of similar structure). It prevents intuitive searching over stored JSON log events from common character sequences (Unix file paths, HTML content types, etc.). It is better that this escaping be optional, so that can be turned on if the JSON will/may end up within an HTML document as created or off if it will not be to save space and to be consistent in escaping with other languages such as C.
liblognorm: json-c does provide an option to not escape the forward slash. The flag JSON_C_TO_STRING_NOSLASHESCAPE has to be added to the flags parameter passed to the json_object_to_json_string_ext() function call. liblognorm does not use this function (it uses the wrapper json_object_to_json_string() function, which defaults to not setting the JSON_C_TO_STRING_NOSLASHESCAPE flag), thus forcing all forward slashes to be escaped all of the time. This option should be exposed to users of lognormalizer. According to the blog post http://blog.gerhards.net/2015/12/rsyslog-and-liblognorm-will-switch-to.html, libfastjson will replace json-c as the underlying library. libfastjson removes this option entirely. It would be good if that library forwarded the option as well, or defaults with escaping off.