jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
29.66k stars 1.54k forks source link

ER: add --strict command-line option for strict parsing of JSON input #2643

Open pkoppstein opened 1 year ago

pkoppstein commented 1 year ago

Proposal

Add a --strict command-line option to ensure strict parsing of JSON input in accordance with the JSON syntax specification.

Terminology:

In the following, a distinction is made between:

1) jq's regular JSON parser 2) jq's streaming JSON parser (corresponding to the --stream command-line option) 3) jq's expression evaluator

Scope:

This proposal should not be construed as disallowing input that includes a stream of JSON entities.

This proposal envisions changes to jq's regular parser, but is neutral with respect to the impacts on (2) and (3), though obviously the proposal can be amended.

Background:

Over the years, several bug reports and proposals pertaining to jq's permissiveness, especially with respect to parsing JSON numbers, have been made. The purpose of this ER is to consolidate most of these into a single "Issue" that reflects the view that:

(1) "A JSON parser MAY accept non-JSON forms or extensions." (https://tools.ietf.org/html/rfc7159)

(2) considerations of backward-compatibility in this case justify the addition of a command-line option.

It is hoped that this Issue will supersede most of the related issues so that they can be closed.

Examples:

000      # jaq's JSON parser interprets 000 as 3 consecutive 0s; jaq's expression parser disallows it
+1       # disallowed by jaq
.2e3     # disallowed by jaq
[1.,2.]  # disallowed by jaq's JSON parser but allowed by jaq's expression parser
1e0      # allowed by jaq's JSON and expression parsers
infinite # disallowed by jaq's JSON parser but prints as null
nan      # disallowed by jaq's JSON parser but prints as null

Related issues:

1264

1389

1404

2414

1571 (comments)

1637 (already closed)

Issues related to "relaxed JSON" (--lenient)

1607

1599

1571 (comments)

2014

See also:

1544 (raw tabs - already closed)

nicowilliams commented 1 year ago

Strictness for numbers will always be a very tough thing to do. There's already too much freedom in the spec (RFC 8259, ECMA 404) as it is. Some implementations will only handle small integers, others will only handle IEEE 754 doubles, etc. It's a mess. Defining strictness for numbers seems like a lost cause.

pkoppstein commented 1 year ago

@nicowilliams wrote:

Defining strictness for numbers seems like a lost cause.

I believe what many people have in mind is strictness w.r.t. the JSON syntax (i.e., the railroad diagrams). I think this pretty much corresponds to strictness w.r.t. jsonlint.com

Anyway, this ER proposal is intended to allow users to request that jq reject input that is not acceptable to jsonlint where the input is supposed to be JSON.