jqlang / jq

Command-line JSON processor
https://jqlang.github.io/jq/
Other
30.08k stars 1.56k forks source link

bug: preservation of literal numbers #2704

Open pkoppstein opened 1 year ago

pkoppstein commented 1 year ago

The literal representation of certain negative numbers is not being properly preserved.

Consider:

./jq --version
jq-1.6-268-gc99981c-dirty

./jq -nM '1E1000'
1E+1000

./jq -nM '-1E1000'
-1.7976931348623157e+308 # !!!

./jq -nM '100000000000000000000000000000002'
100000000000000000000000000000002

./jq -nM '-100000000000000000000000000000002'
-1e+32 # !!!

Platform: Mac OS

leonid-s-usov commented 1 year ago

I believe the issue is that it parses the unary - as a math operation, which collapse the number to a double

nicowilliams commented 1 year ago

I believe the issue is that it parses the unary - as a math operation, which collapse the number to a double

Correct:

([0-9]+(\.[0-9]*)?|\.[0-9]+)([eE][+-]?[0-9]+)? {
   yylval->literal = jv_parse_sized(yytext, yyleng); return LITERAL;
}

If this was a JSON text input on stdin it would not have this problem, and indeed, it doesn't in master:

: ; echo -100000000000000000000000000000002 | jq .
-100000000000000000000000000000002
nicowilliams commented 1 year ago

I'm not sure how to fix this... I guess we could have _negate return a new number with the - added or removed from its literal form if it has one and if it's not zero. Ideally we could fix this in the lexer, but I'm not sure that that's possible.

itchyny commented 1 year ago

When -1e1000 and - 1e1000 yield different results, isn't it confusing? Lexering numbers including the unary operator will cause such situation.

pkoppstein commented 1 year ago

isn't it confusing?

Not much more than 0 + e1000 being different from e1000.

The point is that jq 1.7 will not have infinite-precision integer arithmetic, so while we're in this netherworld, we need to remain quite clear that any arithmetic operation on a number will trigger conversion to IEEE 754 numbers.
It's not ideal, but it has the advantage of being quite clear and simple.

If the release of a successor to jq 1.6 were not so overdue, there would no doubt be time to come up with something that might be marginally less confusing, but for now, introducing complexities in the name of avoiding confusion just risks increasing it!

itchyny commented 1 year ago

Right, so I don't think this is a bug. Operators convert numbers to floating-numbers regardless of unary or binary. Precision preservation of negation is a part of the feature request for arbitrary-precision integer calculation.

pkoppstein commented 1 year ago

@itchyny - The point is that whereas "-1" is a JSON number; "- 1" is not. (It's not even JSON, as jq . and gojq . both attest.)

For jq 1.7, we want JSON numeric literals to be treated specially. Ergo, it's a bug (for jq 1.7).

itchyny commented 1 year ago

Hmm, okay. Let's discuss about how to fix this issue. When -100 is a single literal, 0-100 is divided into two tokens by the lexer; LITERAL 0 and LITERAL -100. We may need some flag to distinguish the context, but I'm not sure this is possible. I also want to point that fixing this issue may change the behavior of [-1 as $x | 1,$x] (for this case, I think the current parser behaves very unintuitively).

nicowilliams commented 1 year ago

@itchyny convinces me that fixing this in the lexer is a bad idea. We could fix this in _negate instead. Or we could document that negative numbers in jq code are not literal-preserving (we could say that literal-preserving behavior was more than anything about JSON, not jq code).

pkoppstein commented 1 year ago

We could fix this ...

Since I opened this issue, I should add that I think it's relatively unimportant, a view that is perhaps supported by the fact that this Issue was not created earlier. That in turn is probably because the preservation of literal numbers is mostly important for non-negative ones, and especially non-negative integers.

However, if jq 1.7 does ship with this anomaly, I think it should be documented as something that will probably change, so I've included a sentence to that effect in https://github.com/jqlang/jq/pull/2731

nicowilliams commented 1 year ago

We are free to change anything we want regarding number representation in JSON emitted by jq as long as we remain compliant with RFC 8259. If we want to replace decNumber with something different, we can. JSON gives us a great deal of latitude in how we represent numbers in JSON emitted by jq -- we must retain that freedom. It is implied that we can switch internal representations, and with that we can have subtle and not-so-subtle changes in precision and in representation in JSON output.

I understand that IEEE 754 doubles have annoyed people a great deal. Switching to decNumber is a best-effort attempt to please them, but it is not a commitment to please all of them, or even any of them, as the issue is not jq-specific but industry-wide and we the jq maintainers do not and will not have answers regarding number representation that will please all. Even now for many operations jq will fall back on IEEE 754 doubles, and I expect that will annoy some users, and we will almost certainly not do much about that.

GCrispino commented 9 months ago

Hey all 👋🏼

I don't think I've understood exactly what's the issue here, but when rendering the number 5322917205011726429, jq yields 5322917205011726000:

❯ echo 5322917205011726429 | jq
5322917205011726000

I wonder then: would this problem share the same cause as the one originally described in the present issue?

wader commented 9 months ago

@GCrispino what jq version is this? please try 1.7 or master which should preserve in your case. Not that this issue is about preserving negative numbers in jq queries which is different from preserving numbers in JSON input

GCrispino commented 9 months ago

@wader I was using 1.6. I updated to 1.7 and it worked fine. Sorry for the distraction and thank you for the quick reply!

leonid-s-usov commented 9 months ago

@pkoppstein

fixing this issue may change the behavior of [-1 as $x | 1,$x]

OMG that example 🤦🏻 it took me a few minutes to wrap my head around what was happening

» jq -n '[-1 as $x | 1,$x]'
[
  -1,
  -1
]
» jq -n '-1 as $x | [1,$x]'
jq: error (at <unknown>): array ([1,1]) cannot be negated
» jq -n '[-(1,1)]'
[
  -1,
  -1
]

Do we really have to preserve this?


I think I agree with @pkoppstein that this looks like a defect. A minus in front of a number should be treated as part of the number as long as it follows the rules of how negative numbers can be represented.

I haven't yet looked at the lexer/parser side of things, but I presume it should be possible to distinguish between a minus that's part of a number and a minus that represents a binary operation. As for the negation operator (unary minus), I do hope we can find a way to address that reasonably, which is where we get back to the example above and I'd consider that example to be a bug rather than a feature worth preserving