hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.24k stars 224 forks source link

[BUG] Parsing bug when multiplying a negative number #1068

Open mike919192 opened 1 month ago

mike919192 commented 1 month ago

Describe the bug Multiplying by a negative number is being parsed incorrectly. Putting the negative number in parenthesis is a workaround to parse it correctly.

To Reproduce This cpp2 code:

#include <iostream>
#include <string>

main: () = {

    std::cout << 1 * -1 << name() << "\n";
    std::cout << 1 * (-1) << name() << "\n";
}

Compiles to this cpp code


//=== Cpp2 type declarations ====================================================

#include "cpp2util.h"

//=== Cpp2 type definitions and function declarations ===========================

#include <iostream>
#include <string>

auto main() -> int;

//=== Cpp2 function definitions =================================================

auto main() -> int{

    std::cout << *cpp2::impl::assert_not_null(1) - 1 << name() << "\n";
    std::cout << 1 * (-1) << name() << "\n";
}

https://godbolt.org/z/KMc6sxoPe

Additional context This is present in the head, but not in the release 0.7.0

sookach commented 1 month ago

I'll take a look at this issue.

sookach commented 1 month ago

Sorry for the late response, and thank you for the reporting this bug. Not sure if you could already tell, but essentially we're parsing the first expression as a dereference of '1' minus '1' which is not correct. The reason the next line doesn't have the same mistake is because the parser has a specific rule for disallowing a postfix star to be treated as a dereference if followed by a left parenthesis as below: https://github.com/hsutter/cppfront/blob/d09b052c130d509d747a114dbb6814eb08a804ec/source/parse.h#L5798

The reason it works in version 0.7.0 but not head is because of commit 5663493 which was published shortly after the release.

@hsutter From what I can tell, in order to fix this issue we would need to change the parser behavior (which would break the code that prompted the original commit).

We can't just add:

MINUS LITERAL PLUS LITERAL

to the list of token(s) that can't follow a unary star, because even though it would fix this specific issue, that rule would prevent valid binary expressions involving a dereference, and it would also require

EXCLAIM+ LITERAL

to be added to the list of prohibited successor token(s) which would introduce an unbounded lookahead since any amount of exclaims can follow in each other in sequence.

Curious what you think the best course of action is, since it seems we will need to change the language's behavior.

hsutter commented 1 month ago

Edited to add: I think the following may also address the way the first line parses, because then like the second line it will see interpreting it as a postfix operator doesn't work and backtrack and try multiplication? Need to try it.

Thanks! I'm just in the middle of another bit where I'm not in a compilable state, but I would try this in parse.h:5798 (this is notes for myself, or in case someone wants to try it feel free):

-           //  * and & can't be unary operators if followed by a (, identifier, or literal
+           //  * and & can't be unary operators if followed by a (, identifier, prefix operator, or literal
            if (
                (
                    curr().type() == lexeme::Multiply
                    || curr().type() == lexeme::Ampersand
                    )
                && peek(1)
                && (
                    peek(1)->type() == lexeme::LeftParen
                    || peek(1)->type() == lexeme::Identifier
                    || is_literal(peek(1)->type())
+                   || is_prefix_operator(*peek(1))
                    )
                )
            {
                break;
            }
sookach commented 1 month ago

Thanks @hsutter. So, the change you propose does indeed fix this issue, but as I suspected:

that rule would prevent valid binary expressions involving a dereference

it introduces another issue. Consider the following code:

x := new<int>(1);
std::cout << x* - 1 << "\n";

this now translates to

auto x {cpp2_new<int>(1)}; 
std::cout << cpp2::move(x) * -1 << "\n";

whereas before it would translate correctly.

hsutter commented 1 month ago

Ah, good point. I'll think about this more. Thanks again for the report and the notes!