we-like-parsers / cpython

Here we work on integrating pegen into CPython; use branch 'pegen'
https://github.com/gvanrossum/pegen
Other
1 stars 0 forks source link

Avoid accidental clashes between names in grammar and infrastructure #125

Closed gvanrossum closed 4 years ago

gvanrossum commented 4 years ago

(From https://github.com/we-like-parsers/pegen/issues/84.)

Items in an alternative may be named, and the names may be referenced in actions. But there are some "forbidden" names. E.g. don't name an item p, because (when generating C code) every rule parsing function has an argument named p. There are other possible name clashes too: res and mark are always local variables. ~And there are many helper functions, with names like is_memoized or singleton_seq.~ And of course anything that's a C reserved word (e.g. if) cannot be used either. Also there are systematic generated names like *_var, *_type and *_rule.

It's easy to rename p, mark and res in the generated code to start with an underscore (by convention, rule names don't start with _, and maybe we should make this a hard requirement). I'm not sure we need to worry about the others, though we may have to warn about them in the docs.

When generating Python code there are other possible clashes, e.g. self, mark and cut. We can handle these the same way. (There are a few others that seem less important, like ast and of course Python builtins and keywords.)

pablogsal commented 4 years ago

I was thinking about also specifying the name of the parser variable in the grammar using a meta so it does not feel like a "magic variable". p is somehow important because it may be used explicitly in actions like _Py_pegen_do_something(p, ...). So we could specify something like:

@parser_variable p

in this way, we can read that in the generator (among other ones that we want to allow to customize) and do the check when we find an assignment in the grammar (this check we can do regardless of the customization, but I think they complement each other).

gvanrossum commented 4 years ago

But macros like CHECK use p implicitly.

pablogsal commented 4 years ago

But macros like CHECK use p implicitly.

They normally expand to one that takes p as an argument:

#define CHECK(result) CHECK_CALL(p, result)

we could place these #define in the generated code (with the appropriate parser variable) and only use the ones that take p in pegen.c and other manually written files. This is: CHECK is defined in parse.c and can only be used in the grammar and elsewhere we use CHECK_CALL (for instance in pegen.c).

gvanrossum commented 4 years ago

That would work, yes.