we-like-parsers / cpython

Here we work on integrating pegen into CPython; use branch 'pegen'
https://github.com/gvanrossum/pegen
Other
1 stars 0 forks source link

Last report on test failures #184

Open isidentical opened 2 years ago

isidentical commented 2 years ago

It seems like there are ~80 test failures (from 14 distinctive groups/individual tests). I think we should look into which ones we should prioritize and which ones we'll left behind (e.g error messages can be the last ones).

Group / Test name Number of Failures Classification Bound Issue
test_mismatched_braces (f'{3:{{>10}') 1 More expressive (previously forbidden, now allowed)
test_no_backslashes_in_expression_part (rf'{"\N{LEFT CURLY BRACKET}"}') 1 More expressive (previously forbidden, now allowed)
test_comments 1 Error messages #157
test_compile_time_concat 1 Error messages
test_conversions 17 Error messages
test_format_specifier_expressions 3 Error messages
test_invalid_syntax_error_message 1 Error messages
test_lambda 1 Error messages
test_mismatched_braces 12 Error messages
test_mismatched_parens 5 Error messages
test_missing_expression 28 Error messages
test_no_backslashes_in_expression_part 7 Error messages
test_unterminated_string 4 Error message
test_syntax_error_for_starred_expressions 1 Error messages
test_parens_in_expressions 4 Error messages
TOTAL 87 83 (error messages) + 2 (over-expressive grammar)
pablogsal commented 2 years ago

CC: @lysnikolaou what's your availability to work on the project? (To start organising work)

pablogsal commented 2 years ago

@isidentical Are the number of failures just distinct tests failing or are independent problems?

pablogsal commented 2 years ago

CC: @superbobry as you mentioned you wanted to help with the project.

isidentical commented 2 years ago

@isidentical Are the number of failures just distinct tests failing or are independent problems?

Each row is an independent failure / problem. The numbers are just representing how many times a problem has repeated in the test suite (i.e., we have 80 test fails but fixing a single fix might get rid of 27 failures at once).

lysnikolaou commented 2 years ago

CC: @lysnikolaou what's your availability to work on the project? (To start organising work)

I'll most probably be able to work ~7-8 hours/week on this.

pablogsal commented 2 years ago

Ok, I propose that we should start focusing on the bugs and new allowed behaviour and then do a push together at the end to fix all error messages. So let's focus on everything marked as "bug" or "new feature" on the table.

By the way, I was thinking that we should restrict the number of levels an f-string can be nested. This way we don't need to deal with deallocating and allocating memory for the buffer (we will use a statically allocated array of fixed-lenght) and we can probably help the problems in https://github.com/we-like-parsers/cpython/pull/169.

For https://github.com/we-like-parsers/cpython/pull/169, one approach we can do is store the information of the last poped item separately so we never reset it and add some asserts to make sure they match when we retrieve it.

pablogsal commented 2 years ago

For f'{\n}' we should probably forbid new lines in the expression part when they are not part of a string

pablogsal commented 2 years ago

For 'rf\'{"\\N{LEFT CURLY BRACKET}"}\'' I am not sure what's here the problem is here.

isidentical commented 2 years ago

For f'{\n}' we should probably forbid new lines in the expression part when they are not part of a string

I just checked and seems like we are forbidding it.

 $ ./python                                                                        467ms
Python 3.12.0a0 (heads/fstring-grammar-rebased:cc2d3361de, Jul 16 2022, 11:22:19) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f'{\n}'
  File "<stdin>", line 1
    f'{\n}'
        ^
SyntaxError: unexpected character after line continuation character
>>> f"{\n1+1}"
  File "<stdin>", line 1
    f"{\n1+1}"
        ^
SyntaxError: unexpected character after line continuation character
>>> f"{1+1}"
'2'
>>> f"{1+1\n}"
  File "<stdin>", line 1
    f"{1+1\n}"
           ^
SyntaxError: unexpected character after line continuation character
>>> f"{1+1\n2}"
  File "<stdin>", line 1
    f"{1+1\n2}"
           ^
SyntaxError: unexpected character after line continuation character

I think it is just an error message mismatch.

lysnikolaou commented 1 year ago

hey, I'm gonna work on these during the sprints. For now, I'm trying to undersand everything that's going on in #169 and fix it. After that, I'm probably working on test_no_backslashes_in_expression_part. Everyone ok with this plan?

isidentical commented 1 year ago

@lysnikolaou @pablogsal unfortunately I can't attend sprints in-person this year :( But if there is anything from my side that you might need, would be happy to help to finalize this project šŸš€

pablogsal commented 1 year ago

test_conversionsĀ (f'{3!Ā s}')

This should be now fixed no?

>>> f'{3! s}'
  File "<stdin>", line 1
    f'{3! s}'
        ^^^
SyntaxError: conversion type must come right after the exclamanation mark
pablogsal commented 1 year ago

@isidentical can you refresh the table? Seems that a bunch of these are fixed now? Maybe do it after https://github.com/we-like-parsers/cpython/pull/199 is landed

isidentical commented 1 year ago

Yep, that's the plan! But a nice information I can offer is that, with #199; we are fully compatible with the existing parser in the real world code so it might be safe to assume that the remaining problems are error message noises.

isidentical commented 1 year ago

Updated the test failure sheet (re analyzed all the failures), only 2 out of 87 failures (the number of individual tests [or subtests] that are failing) seems to be about a geniunine problem (over-expressive grammar where something that wasn't possible is now possible).

  1. f'{3:{{>10}', this probably needs to be forbidden (we already forbid } inside the format specifier part)
  2. rf'{"\N{LEFT CURLY BRACKET}"}', this can stay since it makes perfect sense to be able to reference unicode stuff inside another string literal on a whole different context than what raw f-string covers.