lcompilers / lpython

Python compiler
https://lpython.org/
Other
1.37k stars 156 forks source link

Blank line within an indented block of code is wrongly parsed on Windows #2725

Open Vipul-Cariappa opened 3 weeks ago

Vipul-Cariappa commented 3 weeks ago

Windows uses a carriage return (\r) and line feed (\n) character to represent a new line (i.e. \r\n is a new line). But in other systems, it is only a single line feed character that is used to represent a new line. A problem arises when there is a black line in between an indented block of code. If the blank line is indented with the correct number of spaces or tabs it does not cause any problems, but if it is a completely blank line without any spaces and indentation then the parser understands it as the end of the indented code block even though it is not. This problem only occurs with Windows's new line representation. I am attaching 2 files to test it out. If you open them in vscode or Windows notepad, the editor specifies the new line format of the file in the bottom right corner. If it reads LF, it uses \n to represent a new line; if it reads CRLF, it uses \r\n to represent a new line.

main_unix.py uses only \n to represent a new line and it works correctly, but main_win.py which uses \r\n gives an error.

Contents of file:

# i32 = int # uncomment to make it python compatible

def callme() -> i32:
    i: i32

    for i in range(3):
        print("Hello!!!")

    return i

if __name__ == "__main__":
    callme()

Output:

(lp) C:\Users\vipul\Documents\Workspace\lpython>.\src\bin\lpython.exe --jit .\tmp\main_unix.py
Hello!!!
Hello!!!
Hello!!!

(lp) C:\Users\vipul\Documents\Workspace\lpython>.\src\bin\lpython.exe --jit .\tmp\main_win.py
semantic error: Variable 'i' not declared
 --> .\tmp\main_win.py:6:9
  |
6 |     for i in range(3):
  |         ^

Note: Please report unclear or confusing messages as bugs at
https://github.com/lcompilers/lpython/issues.

main_unix.py.txt main_win.py.txt

nikabot commented 3 weeks ago

I think we need to handle this in tokeniser, see initial handling of CRLF in the tokeniser: 4393893 (#1463)

Vipul-Cariappa commented 3 weeks ago

hisashiburi dana @nikabot (mugiwara)

I have provided a temporary fix at https://github.com/lcompilers/lpython/pull/2706/commits/e65cfe1206d640821bc6f2d492d2f03b416d865e. But yes, a permanent fix should be updating the tokenizer.

nikabot commented 3 weeks ago

Yup! (Crocodile!)