hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.45k stars 239 forks source link

[SUGGESTION] If twos complement negation is prefix, then ones complement negation should also be prefix. #1260

Closed lfnoise closed 3 weeks ago

lfnoise commented 3 weeks ago

Why is twos complement negation a prefix operator, but ones complement negation is postfix? I understand the rationale for & and * being postfix and reading left to right as those operators typically change the type, and ++ and — are mutating operations. But it seems inconsistent to make one form of negation be prefix and the other be postfix. I think it is more consistent for ~ to also be prefix, as it is a mathematical operation like minus, and does not typically change the type or mutate a value.

tsoj commented 3 weeks ago

Yes, that is what is one of my biggest pet peeves currently.

Some time ago I wrote a small chess move generator where there are a lot of bit operations needed, and the postfix ~ is always weird to read, because in "brain space" the action of bit-negation is conceptually close to the action of normal negation.

Additionally to being not a very attention-grabbing symbol, this makes it easy to skip over a bit negation when reading code.

For example:

pawn_left_attack: (pawns: Bitboard, color: Color) -> Bitboard = (pawns & first_file~).color_up(color).shift_left();

targets := p(pawn, us).shift_up() & (p.occupancy() | enemy_home_rank)~;

(taken from here)

gregmarr commented 3 weeks ago

The primary reason, as I recall from much earlier discussions, in keeping the unary operators as prefix is not changing how things in the domain of mathematics are written, as that's beyond the purview of the language. Having to write -1 as 1- just wouldn't make sense. Also, it changes the meaning of equations. 1-+2 in mathematics is 1 - (+2) or -1. Having this suddenly mean (-1) + 2) or 1 in Cpp2 would just be wrong and the source of countless bugs, which is something that Cpp2 is trying to avoid.

Now, what about ~ being a prefix operator? Well, ~1 isn't a thing in mathematics, only in software languages, and it's not a binary operator, so there is no ambiguity by having it as postfix. Also, ~1 isn't an integer literal with a value of the bitwise negation of 1, it's an integer literal with the value of 1 that is then bitwise negated. The question then should be should it match all other unary operators which are postfix?

The philosophy there is

Postfix notation lets the code read fluidly left-to-right, in the same order in which the operators will be applied.

Additionally to being not a very attention-grabbing symbol, this makes it easy to skip over a bit negation when reading code.

(pawns &~ first_file)
(pawns & first_file~)

I'm not sure that &~ which is currently possible stands out more than first_file~. Unary prefix operators also have operator precedence issues with postfix operators such as ~x.print(), which is something that you can do now with UFCS. There is no precedence issue if everything is postfix. x~.print() vs x.print()~. (Assuming that print() is a function that takes a numeric argument and returns a numeric argument.)

@hsutter You might want to update https://github.com/hsutter/cppfront/wiki/Design-note%3A-Postfix-operators to note that "no whitespace before postfix unary operators that are also binary operators" is no longer required.

jcanizales commented 3 weeks ago

Also, ~1 isn't an integer literal with a value of the bitwise negation of 1, it's an integer literal with the value of 1 that is then bitwise negated.

This surprises me, if I'm understanding correctly what you say. Every single compiler will evaluate the constant expression at compile time and emit a literal which is the bitwise negation of 1.

I'm not sure that &~ which is currently possible stands out more than first_file~

If ~ is prefix, no one will write &~ first_file, they'll write & ~first_file. If ~ is postfix, writing first_file~ is compulsory.

To me, the strong arguments for making ~ postfix are operator precedence, and keeping the "math notation exception" simple (i.e. if it's written like that in primary school math, it gets a pass, everything else is postfix).

gregmarr commented 3 weeks ago

This surprises me, if I'm understanding correctly what you say. Every single compiler will evaluate the constant expression at compile time and emit a literal which is the bitwise negation of 1.

I'm referring entirely in terms of grammar and language productions here. Yes, optimizers will definitely convert that to a constant, and maybe even the compiler itself, but as far as the language itself is concerned, it's separate.

When you get to ~x then it is absolutely an operation on a variable, but then again, so is -x, so that's no help.

If ~ is prefix, no one will write &~ first_file, they'll write & ~first_file. If ~ is postfix, writing first_file~ is compulsory.

Very likely that no one will intentionally, but it's possible, and &~first_file is also possible, and there are definitely people that don't like "extra" whitespace, though they are probably a lot less common now that memory for the source code itself isn't as big of a concern. So it's a potential concern, though I agree far less likely.

To me, the strong arguments for making ~ postfix are operator precedence, and keeping the "math notation exception" simple (i.e. if it's written like that in primary school math, it gets a pass, everything else is postfix).

I like that phrasing.

tsoj commented 3 weeks ago

To me, the strong arguments for making ~ postfix are operator precedence, and keeping the "math notation exception" simple (i.e. if it's written like that in primary school math, it gets a pass, everything else is postfix).

I like that phrasing.

I agree that this sounds like a good argument. The only issue I have is that people read code from left to right.

It's like saying "You passed with flying colors, not". While you read the sentence/expression you expect the complete opposite.

I think in theory this applies to all postfix operators, though in practice it mostly is about ~, since ++ doesn't change the value of the expression, and *, &, and $ are likely to be used in contexts where these are already expected, and they mostly don't change the high-level meaning of an expression significantly.

gregmarr commented 3 weeks ago

I agree that this sounds like a good argument. The only issue I have is that people read code from left to right.

That's exactly why things are left to right.

Current C++ is NOT read left to right all the time. See the spiral rule. Even Cpp2 won't be in the most natural form when read left to right all the time, see the section below, but it's a lot better.

For example, ~x.getFilterObject().getBitMask(), where does the ~ apply? You need to know operator precedence to know that it's not applied left to right and that the ~ doesn't apply until the very end.

With ~ as postfix, it is immediately obvious: x.getFilterObject().getBitMask()~ that it is applied left to right, just as it is read. When reading this as plain language rather than programming language, you'd probably say this, starting with a slight rearrangement of the words on the first call: "Get x's filter object, then get its bitmask, and finally compute the bitwise inverse."

It's like saying "You passed with flying colors, not". While you read the sentence/expression you expect the complete opposite.

! is also on that exception list for that reason.

However, again, this would probably be written as !you.passedWith(flyingColors) which if read strictly left to right would be "if not you passed with flying colors". This is unchanged in Cpp2.

I would normally read !vec.empty() as "vec is not empty", rather than not vec is empty, and this again runs into operator precedence, but it has been felt that having that "not" at the end is just too unnatural.

It also potentially leaves room for it to have a different meaning as a postfix operator as some have requested, generally involving assertions, such as asserting that something is not null.

lfnoise commented 3 weeks ago

"Well, \~1 isn't a thing in mathematics, only in software languages" I have to disagree here. Asterisk doesn't mean multiplication in mathematics either, only in software languages. We are dealing with representing mathematics in a software language using the characters commonly available on a keyboard. Tilde is a kind of negation. It can be interpreted either as ones complement negation where '-' is twos complement negation, OR as bitwise logical not, where '!' is a logical not producing a single boolean. Both '-' and '!' are prefix, so it is inconsistent for '~' not to be. It is just as much a purely mathematical operator as the other two. Either they should all be prefix or all be postfix.

gregmarr commented 3 weeks ago

I have to disagree here. Asterisk doesn't mean multiplication in mathematics either, only in software languages.

Right, but that's just about the character used to represent multiplication, which exists in general mathematics. I have never seen "bitwise negation" in a mathematics class, as that is a feature of binary representation of integers.

Both '-' and '!' are prefix, so it is inconsistent for '~' not to be. It is just as much a purely mathematical operator as the other two. Either they should all be prefix or all be postfix.

- is either part of the following numeric literal, or is a shortcut for multiplying the following expression by -1 or -1.0. ~ is neither of those. - has both unary and binary forms, so it has different meaning in postfix. ~ is only unary.

Sorry for the multiple edits, was just over in Slack where Ctrl-Enter is "new line" rather than send.

lfnoise commented 3 weeks ago

I have to disagree here. Asterisk doesn't mean multiplication in mathematics either, only in software languages.

Right, but that's just about the character used to represent multiplication, which exists in general mathematics. I have never seen "bitwise negation" in a mathematics class, as that is a feature of binary representation of integers.

Both '-' and '!' are prefix, so it is inconsistent for '~' not to be. It is just as much a purely mathematical operator as the other two. Either they should all be prefix or all be postfix.

- is either part of the following numeric literal, or is a shortcut for multiplying the following expression by -1 or -1.0. ~ is neither of those. - has both unary and binary forms, so it has different meaning in postfix. ~ is only unary.

-, ~, and ! are all kinds of negation. It doesn't make any sense, and is inconsistent, for two kinds of negation to be prefix, but another to be postfix. ! doesn't have a binary form. ! is not used as a part of a literal value. You would never write !true, you'd write false. These points are irrelevant.

~ is often used in C to express the complement of a set. It does have a mathematical meaning.

jcanizales commented 3 weeks ago

There's two different ways, in this thread, to approach the question of what unary operators should be prefix.

One is: Every unary operator should be postfix, and the only exceptions allowed are when it would be extremely jarring to pretty much every programmer otherwise:

The other one is: Every unary operator should be postfix, except all operators that can be interpreted as some sort of "negation" in some context. So the two above, plus bitwise flipping (~).

IIRC the current policy of the language is the former. What you're asking is a switch to the latter policy. The philosophical argument about whether ~ is a math operator, and if it is a negation, are only productive if (and after) @hsutter chooses to make the switch. The argument to convince him, per his rules of the game, are "this would reduce the amount of things people need to learn".

Wrt. ambiguity, I don't think any of these are less ambiguous than the other, when seen in isolation (i.e. notwithstanding some overall generic cpp2 rule about precedence): prefix ~a << b (does ~ apply to a or a << b?) vs postfix a << b~ (does ~ apply to b or to a << b?). And if there's a rule like "unary always before binary", that solves both cases the same.

gregmarr commented 3 weeks ago

Possible options:

As @jcanizales says, the question really is which do you prefer. All of these positions are defensible, as there is no perfect solution, and they're all about different trade-offs. There have been points in favor of several different preferences in this thread. I'd say it's largely personal preference.

Herb has currently selected the third as his preference. He has reevaluated his position on decisions several times in light of new data. Perhaps some of those points would be enough to sway him towards another. In any case, it's up to him to decide.

lfnoise commented 3 weeks ago

I would choose option 4. Where-, !, and ~ are all forms of negation.

DyXel commented 3 weeks ago

It kind of makes sense for ~ to be postfix in the context of Cpp2 thanks to UFCS, since you could do things like 0b0101'0101.u8()~.wrap_multiply(42) and it will all naturally execute left to right. Same goes for ! but I find hard to defend since you only ever return true/false, doing UFCS with a bool would seem odd.

@tsoj your examples aren't very good, since you need to add parentheses to then negate everything, which is the cue that something funky is going on. ~ being "hard to notice" is entirely dependent on the font used, I could argue the same but in the opposite direction for ! with a bad enough font.

All of that being said, I can see ~ being postfix as an annoyance if you do lots of bitwise ops in your day to day and its suddenly changed, honestly its a bit of a hard call.

hsutter commented 3 weeks ago

Thanks for this discussion! As @gregmarr suggested, I've gone and updated the design note to update it about the whitespace change, but I've also put my answers to this thread here by creating a new section for "The exceptions: What about !, -, and +?".

Updated note 👉 Design note: Postfix operators

Re negation: I think it's largely an accident/coincidence on certain data representations that the bitwise operator ~ happens to also do a mathematical negation. Even if works for some integer types on some platforms, it definitely doesn't work for floating-point representations (whereas - and + do), right? To me, unary ~ isn't related to mathematical notation and so isn't similar in meaning to -, instead it's about bitwise manipulation and definitely highly similar to unary & and | and ~^`.

Put another way, I see { -, + } as a group of related unary operators that want to be spelled consistently, and { &, |, ^, ~ } as another group of related unary operators that want to be spelled consistently.

Thanks for the thoughtful comments, and I hope the updated wiki note is helpful.

tsoj commented 3 weeks ago

I thought a bit about it and I feel like there is definitely a strong equivalence between these operators:

booleans ! && \|\|
bitsets ~ & \|

Some languages even have the same keywords for these (e.g. not, and, or in Nim).

If I understand correctly from cppref, then bitwise operations are only defined for integer types, and their built-in behavior is defined by their base-2 representation, which as far as I know is fixed since C++20.

In mathematics the ~ symbol is sometimes written for complements.

From a quick search, it also appears like at least one C++ library implements the ~ operator with complement semantics, but I could imagine that there are more given the built-in and mathematical precedent.

I also made a quick poll on Discord among chess engine programmers (who have to use bitwise operators regularly, 64 squares on the board and 64 bits in an integer just fits too well):

How jarring would it be, if the bitwise negation were a postfix operator (hello = bitboard~ instead of hello = ~bitboard)?

I like it very much 4%
Not bad 8%
I don't mind either way 8%
I don't like it 38%
Would be absolutely jarring 42%

26 responses.
I admit that this is a very biased poll, I think it gives some useful insights anyway.

The one person who voted "I like it very much", said they voted this way assuming that it would be a new language which would "probably also use a postfix boolean not operator".

gregmarr commented 2 weeks ago

@hsutter Typo in wiki update: "Oven beyond" should be "Even beyond".