Operator precedence when combining unary and infix operators

ponylang / ponyc

Pony is an open-source, actor-model, capabilities-secure, high performance programming language

http://www.ponylang.io

BSD 2-Clause "Simplified" License

5.69k stars 409 forks source link

Operator precedence when combining unary and infix operators #1973

Closed dipinhora closed 5 years ago

dipinhora commented 7 years ago

In the following code example:

actor Main
    new create(env:Env) =>
      let b = I32(1)
      if not b == 2 then
        env.out.print("not 2")
      else
        env.out.print("2")
      end

The output is 2 even though the value of b isn't 2. This is because the not operator is applied prior to the == operator.

Similarly, in the following code example:

primitive A

actor Main
    new create(env:Env) =>
      let a = A
      if not a is A then
        env.out.print("not A")
      else
        env.out.print("A")
      end

The compiler produces the following error:

Error:
main.pony:6:10: couldn't find 'op_not' in 'A'
      if not a is A then

This is because the not operator is applied prior to the is operator.

In both of these cases, the programmer has to remember which operator is applied first and which is applied second.

It would be ideal if the compiler would throw an error with the following error message as it already does in other similar operator precedence scenarios:

Operator precedence is not supported. Parentheses required.

jemc commented 7 years ago

This needs discussion. To me it seems fairly straightforward that unary operators bind more closely to their operand than binary operators, but I understand that not everyone will agree.

So, we need to decide if there is consensus that unary operators whose operand is a binary or unary operator should not be allowed (without parentheses). Right now I am a weak "yes" on leaving this the way it is, but am interested to have a discussion about it.

SeanTAllen commented 7 years ago

I find the behavior that @dipinhora documented to be quite surprising. I think this is a case where subtle bugs can be introduced. I'd like to see this remedied.

dipinhora commented 7 years ago

@jemc The stated goal behind requiring parenthesis for infix operators (take from the tutorial; emphasis added) applies equally well for the examples I presented:

The problem with this is that the programmer has to remember the order and people aren't very good at things like that. Most people will remember to do multiplication before addition, but what about left bit shifting versus bitwise and? Sometimes people misremember (or guess wrong) and that leads to bugs. Worse, those bugs are often very hard to spot.

Pony takes a different approach and outlaws infix precedence. Any expression where more than one infix operator is used must use parentheses to remove the ambiguity. If you fail to do this the compiler will complain.

I do agree that there is likely some discussion needed about which scenarios the requirement for parenthesis should be enforced in. I also think there is a distinction between unary operators like - which are bound directly to an operand and not which is not bound to an operand. It is very clear to me that -x == 1 should apply the - to x and not to x == 1 since the - is right before the x without a space. But the same isn't true for not in not x == 1 because of the space between the not and the x. I find it equally plausible that the not should apply to either the x or the x == 1 and which one I believe is correct depends on my ability to remember that not is a unary operator even though there is a space between it and its operand.

jemc commented 7 years ago

@dipinhora - There's nothing to prevent you from having a space between the - and the x, though. What would be your desired behaviour in that case?

dipinhora commented 7 years ago

@jemc Hmmmm... I did not realize that having a space there was possible/allowed. I probably should have tested prior to commenting. 8*/

Regardless, to me, the addition of the space makes the intent ambiguous just like with the not. Was the intent of - x + 1, -(x + 1) or was it (-x) + 1.

As a side point, being able to have an arbitrary amount of spaces/newlines/comments between the unary - (and other unary operators) and the operand seems very dangerous. I can imagine a situation where an errant - (as a typo) could conceivable cause havoc. Simple example:

actor Main
  new create(e: Env) =>
     e.out.print(z(1).string())

  fun z(x: I32): I32 =>
    // This is a very important comment
    // masking the fact that I have an accidental
    -// `-` which will screw up the
    // results of this calculation

    x

jemc commented 7 years ago

Note that if we did what you are asking for here with -, Pony would become whitespace sensitive (albeit in a very small way).

Last I knew @sylvanc was opposed to adding whitespace-sensitive rules to the grammar, so I'd be interested to hear his take on this discussion.

EDIT: To be fair (and pedantic), Pony is already whitespace sensitive, in that it is sensitive to newline characters - the presence or absence of a newline character can change the meaning of a series of tokens. Still, this would be the first grammar rule that needs to be sensitive to space characters.

rurban commented 7 years ago

I'm with @jemc. The current implementation seems to be correct, even if I agree that it's unfortunate that we don't have a ! for not. Then it would be clear to everyone, that not binds stronger than ==.

The fix would be to write if b != 2 then or use parens, but adding a warning does more harm than good

SeanTAllen commented 7 years ago

I'm not in favor of a warning. I'm in favor of an error. The same as we do with other precedence issues. I disagreee that it does more harm than good. If that was the case, then I think we should ditch the other precedence errors as well.

sylvanc commented 7 years ago

Pony uses a Smalltalk-like precedence order:

Method calls bind most tightly, whether . or ~ or .>
Unary operators bind immediately after method calls
Binary operators bind last

Smalltalk rules are slightly different:

Unary messages bind most tightly (but such messages come after the operand, not before)
Binary messages bind immediately after unary messages
Keyword messages bind last

Additionally, Pony requires binary operators to supply a precedence through parentheses. Smalltalk does not, and simply evaluates them left to right, regardless, so in Smalltalk 3 + 4 * 5 equals 35, rather than 23. In Pony, 3 + 4 * 5 is a compile-time error.

In Pony, the unary operators are: not, -, -~, addressof, and digestof. A significant amount of the complication of the Pony grammar is distinguishing between - as a unary or a binary operator.

All of the above is of course common knowledge :)

This means that not a == b binds as (not a) == b, in the same way that -a == b binds as (-a) == b. If the binding were changed to not (a == b) we would also get -(a == b). If we required parentheses for not (whether or not we changed the binding) we would get (not a) == b and we would also require (-a) == b. If we replace not with ! then we would still have the distinction between !a == b, (!a) == b, and !(a == b).

Any change we contemplate needs to be helpful for all the possible expression combinations. And they can get pretty complex. I certainly agree that not surprising the programmer is important, but as yet there doesn't seem to be a binding (or parenthesising) proposal that is clearly less surprising for unary operators than what we currently have. I am of course very much open to being convinced :)

Well, except on "whitespace changes binding", which I'm not really open to being convinced on. It's a bad idea :)

dipinhora commented 7 years ago

@sylvanc Thanks for the overview. It was definitely not common knowledge for me.

Re: Operator precedence, it seems to me that the current rules on when to require parenthesis are incomplete (thus allowing for the concern raised in this issue). According to the tutorial the current rules for when to require parenthesis are:

Any expression where more than one infix operator is used must use parentheses to remove the ambiguity.

If this rule were expanded to be:

Any expression where more than one infix or unary operator is used must use parentheses to remove the ambiguity.

That would cause the similar parenthesis related errors to be thrown only when multiple operators are combined to ensure that there is no ambiguity similar to the infix case.

not a == b would start to generate a compiler error about precedence and ambiguity since the compiler can't know if the intent of the programmer was to apply the not before or after the ==. This would also cause -a == b to throw a similar ambiguity error and require parenthesis. While not ideal, this seems like an acceptable tradeoff given the overall benefit provided.

For example, it is currently possible (while likely nonsensical) to do something like:

-not-digestof env

With the suggested change, this would throw an error regarding parenthesis (although, there is no ambiguity around precedence here). However, this would lead to more readable code if someone were combining multiple unary operators (which is debatable as to whether it is something the compiler should be enforcing or not).

If we do not want the compiler to throw an error on this example, we can instead make the rule:

Any expression where more than one infix operator is used or if infix and unary operators are combined must use parentheses to remove the ambiguity.

This would allow for:

-not-digestof env

but not:

not b == 2

Re: "whitespace changes binding", I'm not suggesting it's a good idea in general, just pointing out that the current grammar has a potential risk in relation to unary operators. It does seem that the risk is only relevant if there are comments allowed in-between the operator and the operand. My above example becomes apparent/benign if the comments aren't allowed since they can't "hide" the operator:

actor Main
  new create(e: Env) =>
     e.out.print(z(1).string())

  fun z(x: I32): I32 =>

    -

    x

The only change that might be worth considering would be to disallow comments between unary operators and their operand.

rurban commented 7 years ago

@dipinhora There's no ambiguity. There's a simple rule for operator precedence which you have to learn. Bothering correct code with disambiguity warnings is more harmful.

In perl we solved this newbie problem with use warnings and use diagnostics, which optionally adds some parser warnings. ponyc --verbose 1 could do that.

dipinhora commented 7 years ago

@rurban Based on your comment that There's no ambiguity. There's a simple rule for operator precedence which you have to learn., combining multiple infix operators shouldn't cause a compiler error either.

If learning/memorizing operator precedence is only a newbie problem, why does the Pony language design explicitly account for operator precedence ambiguity to prevent bugs related to it?

SeanTAllen commented 7 years ago

@rurban I'm a cre team member. I write Pony every day. More than most people have. I found this surprising. "Newbie problem" is a term I would like to see banished. Please try to use something less condescending.

SeanTAllen commented 7 years ago

Banished isn't the right word but I'm sending this from an emergency room so, sorry. I do think newbie problem is condescending.

jemc commented 7 years ago

@dipinhora - your point about consistency is compelling, but I think it's not quite appropriate to say "we need to treat unary operators exactly the same as binary operators", because I think it glosses over some important differences between binary and unary operators that help to explain why the current system was chosen.

For unary operators, relying on the "simple rule for operator precedence" that makes for a simple LL1 grammar means that the operator closest to the term happens first, with "outer" operators being applied to the result.

For binary operators, we have to deal with terms on both sides, so the "simple rule for operator precedence" has to be left-to-right (or right-to-left), which leads to cases that are totally backwards for people used to dealing with basic arithmetic predence, such as the 3 + 4 * 5 == 35 case that @sylvanc mentioned earlier.

The founders of Pony considered 3 + 4 * 5 == 35 to be unacceptable language experience, but still wanted to benefit from the simple parsing rules in the parser (and generate the error about parentheses being required later in the compiler). So we end up with a system where the parser is simple, and all binary operators are evaluated from left to right, but if the operators are mixed without parenthesis, we get an error message.

I don't think that this makes for a closed door about the question of doing unary operators differently from how they're done now, but I think it does explain why we are where we are today, and why it's not necessarily an inconsistent choice from a language design decision.

I'm with @sylvanc in that the current system "makes sense" within that paradigm, in that it's not a case where the current behaviour is clearly wrong and the solution is obvious. So if we want to make a change, I would like to see a detailed proposal that takes into account all of the concerns involved here, including that of keeping the grammar/parser simple (which one of the major reasons why Pony doesn't support varying levels of binary operator precedence today).

Even if we don't make a change, this is obviously a teaching issue. Having done a lot of work on parsers and grammars in the past, the rules for precedence seem quite simple and elegant to me, but we have to recognize that not everyone comes from a PL background, and we have to have an effective way of teaching the rules of the syntax. If we have veteran Pony programmers that are not aware of these rules, we're failing at that task.

I also want to echo @SeanTAllen's point that we have to be careful to avoid being condescending. It may not always be intentional, but we should all try to be mindful of how we speak, what we are trying to communicate, and how it may be interpreted.

dipinhora commented 7 years ago

@jemc I don't believe I indicated that we need to treat unary operators exactly the same as binary operators. I presented a couple of options for alternative rules (one of which treats them both the same and the other which doesn't) for where parenthesis might be required by the compiler to remove ambiguity for consideration. Also, I think that the history of why the system exists as it does, while important, shouldn't really be a major concern as to determining whether to change the design or not. The decision to change or not should be based on the pros/cons of the different alternatives (including implementation/maintenance overhead [which is related to a large extent on the history of how we got where we are] along with UI/usability and all the other stuff that goes into a cost/benefit analysis of such a change).

I think it does explain why we are where we are today, and why it's not necessarily an inconsistent choice from a language design decision.

I think you seem to be misunderstanding my statement about the current rules on when to require parenthesis are incomplete (quite possibly because my phrasing was more abrasive than intended; if so, I apologize as that wasn't my intent). It has nothing to do with the language design from the perspective of the LL1 grammar or the implementation of the parser or any other similarly technical detail.

I see the language rules and the compiler errors as the UI. If the UI (as related to operator precedence) happens to be the way it is due to the technical implementation (your explanation about the LL1 grammar), then that's fine if that is how it is explained to the user. However, that is not how the operator precedence UI for pony is explained to users. It is explained as purposely being this way to reduce ambiguity (Any expression where more than one infix operator is used must use parentheses to remove the ambiguity.). If what is written in the tutorial is the actual intent behind the way the grammar work, it stands to reason that the goal of removing ambiguity should be applied in other cases where ambiguity exists which is where my statement (and this ticket) came from.

I would like to see a detailed proposal that takes into account all of the concerns involved here, including that of keeping the grammar/parser simple (which one of the major reasons why Pony doesn't support varying levels of binary operator precedence today).

This seems to be at odds with the explanation in the tutorial of trying to remove ambiguity. Either removing ambiguity is the goal (and the design is purposeful and would remain the same regardless of the complexity of the grammar/parser) or it isn't (and the design is a consequence of keeping the grammar/parser simple). Maybe there's a third alternative where it can be both but I'm not able to see it. I'd appreciate an explanation or a pointer in the right direction if that's the case.

I would like to see a detailed proposal that takes into account all of the concerns involved here, including that of keeping the grammar/parser simple (which one of the major reasons why Pony doesn't support varying levels of binary operator precedence today).

I don't disagree with your request but I am not a language designer. I do not know the complexities of the grammar and the parser nor how the rule changes I suggested might impact the grammar and parser. Nor do I have time to learn enough to write up a proposal taking everything into account. In other words, I would be very happy if others agree with my suggestion and Pony is changed to remove the ambiguity (mainly because I have fallen victim to it many times because my brain thinks that if there's a space then it's not clear what the binding is supposed to be and always thinks it's different than what actually happens) but if nothing changes, I'll find other ways of coping and move on.

jemc commented 7 years ago

@dipinhora To be clear, I'm not saying there's no problem here, or that the current situation is ideal - just that it makes sense in the context of how the decision was made (context which is part of the history, but which definitely carries over into the present as well).

This seems to be at odds with the explanation in the tutorial of trying to remove ambiguity. Either removing ambiguity is the goal (and the design is purposeful and would remain the same regardless of the complexity of the grammar/parser) or it isn't (and the design is a consequence of keeping the grammar/parser simple). Maybe there's a third alternative where it can be both but I'm not able to see it. I'd appreciate an explanation or a pointer in the right direction if that's the case.

Well, to be clear, I'm not one of the founders, and we don't have detailed documentation of how/when this decision was made, so I'm only speculating based on things I've picked up from conversations, from reading the compiler, etc. So I can give my educated guess about the reasoning, but I'd need @sylvanc to confirm that my explanation matches the original reasoning.

The third alternative that I'd put forward is that the explanation in the tutorial itself is incomplete and ambiguous (ironically enough :stuck_out_tongue_closed_eyes:). I think both aspects are at play (grammar simplicity as well as removing ambiguity), but that the ambiguity the tutorial talks about removing is not quite the same kind of ambiguity you're raising here.

That is, the ambiguity being removed is that arising from having to remember that different kinds of binary operators being treated with different precedence (+ and *, from the earlier example). The conventional approach of giving them different precedence creates a long list of rules about precedence (see this example) that anyone using the language has to remember. All told, it's not uncommon for this list of rules to have 8-10 levels or more of depth, and that context has to be carried in the mind of the developer at all times to be able to resolve ambiguities. The Pony parser rejects this paradigm and says that all binary operators have the same precedence, which removes the ambiguity for binary operators. However, recognizing that 3 + 4 * 5 == 35 is a surprising result, the Pony compiler will refuse to let you mix binary operators without parentheses.

Unary operators do not have the same kind of precedence issue that binary operators have - they only ever have one operand, and also there are no established conventions for some unary operators being "stronger" than others, as we have with binary operators (* is "stronger" than + by common convention). As a result, there is no special action needed within unary operators.

Your concern comes about when we want to consider operators as a whole, including both unary and binary operators as being part of the same set. And in that context, yes, Pony has an implicit rule, but it's only one rule - unary operators bind stronger than binary operators, in every case.

In fact, I can't off the top of my head think of any language that I've used which does not conform to that rule (unary binds stronger than binary). This is why I'm having a hard time seeing the current behaviour as surprising - it seems to be a fairly universal decision in PL design. If you can provide counterexamples, I'd be happy to look at them, of course.

For whatever it's worth, in the Nim-lang precedence table that I linked to (which I chose mostly because they have very clear official documentation about precedence, which many languages lack), I think it's worth noting that they specified "Unary operators always bind stronger than any binary operator" as the very first sentence in their section about precedence, and that information is not included in their table of precedence levels. This seems to me to underscore the idea that unary vs binary precedence is not the same concept as level by level precedence within the binary operator set (which is what their table is dedicated to).

dipinhora commented 7 years ago

@jemc Thanks for taking the time for the detailed responses. I agree with most everything you've stated.

It seems I misunderstood (or read too much into) the goals of removing ambiguity as stated in the tutorial in reference to operator precedence. That combined with my general dislike of inconsistency in UIs (and ambiguity in code resulting in bugs due to my idiocy) brought about this ticket and the ensuing discussion. Hopefully it wasn't too much of a distraction.

As a side note, the Nim precedence table makes my head hurt. I've looked at it a few times over the past 30 mins and I still have almost no clue how they work (especially since Nim allows user-definable operators). Maybe I'm in the minority, but I'd much rather have the compiler force me to have extra parentheses so that code is more generally understandable and maintainable than to try and memorize all the precedence rules and hope I never make a mistake or get confused.

@SeanTAllen, if you're okay with things as they are, I'll close this ticket.

jemc commented 7 years ago

@dipinhora Definitely hasn't been too much of a distraction - I think it's an important conversation to have. I appreciate you bringing this up, and I also appreciate everything else you've contributed to this community.

dipinhora commented 7 years ago

@jemc Thanks for the kind words. I appreciate your patience with me in both this and other issues/PRs where I've droned on and on about things I know almost nothing about. 8*)

sylvanc commented 7 years ago

@dipinhora I think this has been an excellent discussion, even though I think we will leave unary and binary binding the way it is. The very idea of "intuitive" is of course relative, as it is all learned experience!

Thanks very much for bringing it up.

@jemc pointed out on the sync call that before we close this, we should update the tutorial to make this clearer and less ambiguous.

jemc commented 5 years ago

Opened https://github.com/ponylang/pony-tutorial/issues/324 to track this work, since it is indicated in sylvan's comment above that this would only be a change to the tutorial.