Closed RalfJung closed 2 months ago
This also made me question my sanity some time ago, a lint would have been very helpful. The code in question was
assert_eq!(1_i32, -1_i32.rem_euclid(12)) // should be (-i_i32)
I would personally be very surprised if -x.method()
and -1.method()
parsed differently, and it feels proper to me that -(x.method())
would be the way it's parsed.
This matches similar precedence rules in maths, like how -2x is -(2x).
Another good test is if adding 0 to the beginning would change the result: 0-x.method()
shouldn't parse as (0-x).method()
, so why should -x.method()
be different?
In math I would insist on parentheses since -2^x looks ambiguous.
Another good test is if adding 0 to the beginning would change the result:
That's not at all how think about negative integer literals. -2
is just the mathematical integer negative-two, and having -2.method()
split up the literal is totally unexpected.
It's great that your intuition aligns with the current parser, but that doesn't mean that everybody's intuition aligns with the parser. I literally thought the gamma_ln
function had a bug (see the comment linked in the OP); it didn't even occur to me that the Rust parser could dis-associate the -
from my integer literal -- that possibility was so outlandish it didn't enter my mind. So clearly this is ambiguous, and the right thing to do is add parentheses.
I too would be very surprised if it was not parsed the way it currently is. If -5.method()
were parsed as (-5).method()
, then how would we expect - 5.method()
to be parsed? You either end up having very unintuitive behavior, where - 5.method()
is parsed as (-5).method()
, or you end up in a situation where whitespace affects the way an expression is parsed.
To my knowledge, every language with both unary minus and a .
operator does exactly what Rust currently does here.
I also don't think it's fair to say that the compiler is disassociating the -
from the literal. The lexer has always treated -
as a separate token from whatever follows it.
It doesn't matter what the lexer does internally. For many humans reading and writing code, "-5" is an atom that cannot be split meaningfully. If the lexer doesn't work like that then it is up to us to minimize the fallout and confusion stemming from that decision. This is true even if the alternative parsing would be just as surprising in other contexts. Thay just means we should steer the user towards code that is unambiguous.
Even ignoring what the lexer does, my point is that for many other humans, especially those who work in mathematics or computing, -5
is a sequence of two atoms that obey standard operator precedence and just happen not to have a space between them. For those of us in that group, anything other than the current behavior would be confusing. I don't necessarily disagree that it can create confusion, though I do think that changing the parser would have been a step too far.
As a side note, rustfmt
currently reformats let _ = - 5.method()
to let _ = -5.method()
. Should this be another angle by which we can reduce confusion?
The way it's currently being parsed feels very natural and correct to me as well. It not only matches similar rules in math as pointed out earlier but is also how it works in every language that allows an equivalent syntax. Adding extra parentheses in situations like this would just feel like visual noise to me, personally.
Note that there are cases where a negative literal is treated as a fused literal, in particular in cases with the literal
macro matcher. In these cases, it was weird that 1.0
would be accepted but not -1.0
, which makes sense from a syntax perspective but not from a user perspective.
I don't really think that any of the proposed suggestions for negation really work well, though. Adding an extra space looks like a typo, and adding extra parentheses also makes rewriting code more obnoxious since it's a non-local change (adding a negative sign requires adding a closing parenthesis at the end of the expression).
Even though the precedence rules are well-motivated, parentheses exist regardless and Rust even has lints that encourage using them even in unambiguous situations, like converting x && y || z
to (x && y) || z
.
So, I do think that at minimum there could be an allow-by-default lint suggesting one or more of these methods, and/or style rules for rustfmt
.
Even ignoring what the lexer does, my point is that for many other humans, especially those who work in mathematics or computing, -5 is a sequence of two atoms that obey standard operator precedence and just happen not to have a space between them.
As a data point, I work in computing. Programming languages are literally my job.
I guess we'd need a study to see how many "many" is. :shrug:
though I do think that changing the parser would have been a step too far.
It's not realistic to change the parser so we agree on the outcome here, if not on the reasons.
It not only matches similar rules in math
I'm not aware of math having well-defined operator precedence for cases like this. Do you have a source for this claim?
Even though the precedence rules are well-motivated, parentheses exist regardless and Rust even has lints that encourage using them even in unambiguous situations, like converting x && y || z to (x && y) || z.
Yeah, that's why I was thinking a lint would make sense here. FWIW those cases are ambiguous if you can't remember whether "and" or "or" binds stronger. And in languages like bash, they actually bind equally strong (and are left-associative), so this is inconsistent across languages.
I don't think it is realistic for everyone to remember all the precedence rules, so I agree with the compiler that the parentheses should be added here to avoid ambiguity.
for many other humans, especially those who work in mathematics or computing, -5 is a sequence of two atoms
I doubt that many mathematicians would consider "-5" anything but a single unit, the negative integer that is the additive inverse of 5, just like they wouldn't consider "½" a division but the rational number that's the multiplicative inverse of 2.
Note that there are cases where a negative literal is treated as a fused literal, in particular in cases with the literal macro matcher.
Interestingly, $lit:literal
will even match unfused literals like - 1
. The parser follows standard operator precedence, and only then raises negated literal expressions to be literals themselves.
As someone who can't imagine this expression being parsed any other way, I'm curious how everyone else views whitespace. My assumption has always been that I can add arbitrary whitespace around any operator and get the same results. So if you find the way this expression is parsed confusing, how would you mentally parse the following expressions?
-1.method()
- 1.method()
-x.method()
- x.method()
It not only matches similar rules in math
I'm not aware of math having well-defined operator precedence for cases like this. Do you have a source for this claim?
Even though this wasn't directed at me, I figured I'd say that my primary argument is based upon the exponent rules (which always bind exponents before unary minus in mathematics), and the fact that function calls are generally put at the highest precedence above even that. Note that these rules are variable, for example, some programs like Excel will bind the unary minus before exponents, but I consider this unnatural since it goes against standard mathematics. In primary school, we were explicitly taught that expressions like -2x were a potentially confusing case under these rules, although if you consider unary minus the same as any other arithmetic operator, it makes sense that it would have lower precedence.
Function calls having the highest precedence has also been the standard since C, and I would imagine that methods would have similarly high precedence. Since unary minus doesn't have to apply to just numeric literals, I would always imagine it to be parsed separately, since as shown, the precedence of -1.method()
even mattering would depend on -
being lexed as a separate operator.
From the peanut gallery: there's an additional complication for minimum signed integer values:
-128i8
is correctly interpreted as -128.128i8
, which would be ill-formed because 128i8
is out of range. Intuitively, this suggests that -128i8
is interpreted as a single literal.-128i8.pow(1)
is parsed as -((128i8).pow(1))
, resulting in an out-of-range error.-
operator has confusing precedence -- until the method call is added at the end, the -
evidently isn't acting as a unary operator at all.This is not a result of applying the unary minus operator to the literal
128i8
, which would be ill-formed because128i8
is out of range. Intuitively, this suggests that-128i8
is interpreted as a single literal.
The parser parses everything according to standard precedence rules, treating -
as unary minus. So this gets parsed as something like Unary(Neg, Literal("128i8"))
. The compiler then treats this entire unit as a literal for purposes of overflow checks and such. It's not that -128_i8
is interpreted as a single literal; it's that Unary(Neg, Literal(_))
is treated as a sort of "honorary literal" after parsing. This is why let x = - 128i8;
doesn't result in an error either.
However, a method call like
-128i8.pow(1)
is parsed as-((128i8).pow(1))
, resulting in an out-of-range error.
How does your brain parse - 128i8.pow(1)
?
It seems bizarre for a literal to become out of range as a result of adding a method call.
You've added another operator (.
) with a higher associativity, which changes what the -
is being applied to. As an analogy, 2 + 2
is 4
, but 2 + 2 * 3
is not 12
. Adding the * 3
changes what 2 +
is being applied to.
However, a method call like
-128i8.pow(1)
is parsed as-((128i8).pow(1))
, resulting in an out-of-range error.
While this makes sense, it's certainly surprising and I never thought of this case. I definitely agree that situations like this should have a warn-by-default lint at minimum, and maybe there can be a specific list of methods where this particular parsing rule has "weird" results, like this one.
maybe there can be a specific list of methods where this particular parsing rule has "weird" results, like this one.
To clarify, the method being called doesn't matter here. It's a compile-time error because the compiler can't create an 128_i8
to call the method on. -128_i8.to_string()
results in the same error.
My assumption has always been that I can add arbitrary whitespace around any operator and get the same results. So if you find the way this expression is parsed confusing, how would you mentally parse the following expressions?
My intuition parses -5.method()
different from - 5.method()
. This makes sense since -5
is the atom "integer: negative five", while - 5
is unary minus applied to 5
. Those two terms evaluate to the same result but they are not the same term. (Yes I do realize that's not how the Rust parser works.)
I'd say that my primary argument is based upon the exponent rules (which always bind exponents before unary minus in mathematics),
Yeah you mentioned that above. I wasn't aware of this rule, but Wikipedia confirms. Wikipedia also lists some exceptions where the rule does not apply, though.
Anyway, while we can go on discussing which version is "more intuitive", that's not quite the point of this issue. The point of this issue is that the current parsing is ambiguous and surprising to some people. There's two outcomes here:
I prefer outcome 2, but I don't have the data to prove that it's ambiguous for "sufficiently many" people. In this issue we seem to have a fairly even split, but of course this is far from representative.
Function calls having the highest precedence has also been the standard since C,
C doesn't have method calls so the issue can't even arise there? But I take it this means Java and C++ parse this like Rust. That's a good argument to keep the parser as-is, but it's not an argument against a lint IMO.
C actually does have method calls since you can put function pointers in structs, although since function application has a higher precedence than field access, you need to add parentheses anyway. Although -(x.y)()
would still be parsed as -((x.y)())
.
I should clarify my position here: I'm fully in support of adding rules to both rustfmt and rust, it's mostly whether they'd be enabled by default that is my particular concern.
I should clarify my position here: I'm fully in support of adding rules to both rustfmt and rust, it's mostly whether they'd be enabled by default that is my particular concern.
Thanks for clarifying. I do think the lint should be warn-by-default, since otherwise it won't actually help people that run into this without realizing the possibility of running into this (like my past self).
How languages handle exponentiation is a close analogy. If a language were to treat -2
as a single atom, then -2**2
would equal 4
.
However, in Fortran (1957-), Perl, Python, and Ruby, -2**2 == -4
.
In Haskell, -2^2 == -4
(and -2**2 == -4.0
).
And in Rust, assert_eq!(-2i8.pow(2), -4)
.
Linting may seem somewhat opinionated given the precedent and history.
We should also think about symmetry with the other unary operator, !
. Should these lint?
#[test]
fn test() {
assert_eq!(127, !2u8.pow(7));
assert_eq!(false, !2u8.pow(7).is_power_of_two());
}
(I would not expect linting here. But I would expect consistency with -
in any case.)
Mathematically, adding a negative sign is taught as being exactly equivalent to a multiplication by negative one. That's at least one justification for why $-2^2 = -4$ in common notation.
Linting may seem somewhat opinionated given the precedent and history.
And how may people programming in these languages know this rule? In a Python code review I would definitely insist on adding parentheses to -2**2
.
Should these lint?
No. These are clearly not literals.
But I would expect consistency with - in any case.
!
and -
are very different though. -
exists as part of the syntax for literals like -5
, unlike !
. Even the rustc parser recognizes this difference in its treatment for -128u8
.
As @chfogelman described, the rustc
parser treats -128i8
in an ordinary manner, but later stages of the compiler then treat it specially for overflow checks. Note that -(128i8)
also works.
I definitely support warn-by-default. Parsing is invisible to the user, so if you don't know that operator precedence is the issue, it'll take a very long time to figure out.
I don't think !
should be linted on. Unlike -
, prefix !
doesn't have baggage from mathematics or everyday life.
Note that
-(128i8)
also works.
Yep, and so does (((((-(((((128i8))))))))))
. The AST doesn't carry any information about parentheses.
As @chfogelman described, the
rustc
parser treats-128i8
in an ordinary manner, but later stages of the compiler then treat it specially for overflow checks. Note that-(128i8)
also works.
It's not just overflow checks. It's part of turning the tokens into an AST. -128
becomes a constant with value -128, !128
becomes the unary operator !
applied to the constant 128. So there's a fundamental difference between -
and !
baked pretty deep into the rust parser (or a later stage during AST/HIR/MIR construction).
As @chfogelman described, the
rustc
parser treats-128i8
in an ordinary manner, but later stages of the compiler then treat it specially for overflow checks. Note that-(128i8)
also works.It's not just overflow checks. It's part of turning the tokens into an AST.
-128
becomes a constant with value -128,!128
becomes the unary operator!
applied to the constant 128. So there's a fundamental difference between-
and!
baked pretty deep into the rust parser (or a later stage during AST/HIR/MIR construction).
At the level of AST generation, the literal is always 128i8
, regardless of whether the input is -128i8
or !128i8
. The AST for these is exactly the same except that Neg
is replaced by Not
.
Here's one place where there is special handling in the compiler: compiler/rustc_lint/src/types.rs:414
fn lint_int_literal<'tcx>(
...
) {
...
if (negative && v > max + 1) || (!negative && v > max) {
...
}
}
Looking at:
fn main() {
dbg!(!127i8);
dbg!(-128i8);
}
In the MIR, these both end up as const i8::MIN
in release mode. In the HIR, these are both still just !127i8
or -128i8
.
@traviscross -128i8
starts being treated as a fused literal during MIR lowering:
Basically, nothing in the entire compiler treats !127
as a literal, because it doesn't have to. -
needs special treatment because it's used to form i*::MIN
literals, but !
never forms part of a literal, not in the AST, not in the HIR, not in everyday life.
I can get behind linting -1.method()
because of this special treatment of -
, but since !
doesn't get any special treatment, I don't see the point in linting on !1.method()
.
@chfogelman: Interesting. Thanks for pointing to where that happens during MIR lowering.
Agreed !
doesn't get any special treatment because mathematically it doesn't need it. The question is whether that special treatment, motivated by the desire to make one number per type work in a convenient way, is enough to justify different treatment for the two otherwise similar unary operators. It's an interesting question to consider.
I think that no matter what here, the solution shouldn't be motivated by what the parser actually does here, and instead by what people expect the parser should do. The reason why I personally would rather not format -1.method()
differently is because I expect it to work a certain way and what it does matches my expectations.
However, my expectations are not everyone else's expectations, and that's reason to look into this. Explaining what the parser does here ultimately doesn't make a difference because we're not talking about the current implementation, but how we expect the parser to act regardless of implementation.
fwiw, Ruby parses it the way that's expected in the original issue (different from Rust):
-2.abs
#=> 4
x = 2
-x.abs
#=> -4
This came up here but I am pretty sure I ran into this before: Rust parses
-5.5f32.method()
as-(5.5f32.method())
. I don't know if that is the parsing result that anyone expects, I certainly would never expect it-5.5f32
is a float literals, and that is what I want to apply the method to, I never even considered that the compiler might interpret this as a unary minus applied to the method result.Assuming that we cannot change the parser, I think we should spend some effort defending against the possible confusion here. For instance:
-5.5f32.method()
(suggesting to write-(5.5f32.method())
if that's really what one meant)