Reusing `syn`s with slightly different invariants, or not

elegios commented 3 years ago

MExpr has a somewhat unusual handling of identifiers: there are no disallowed identifiers, any unicode string is a valid identifier. As such, the TmVar node in the AST has no restrictions on the identifier (the same is true for identifiers in constructors, lambdas, lets, etc).

When implementing the OCaml subset it is convenient to reuse these nodes since the semantics are otherwise essentially the same. However, OCaml restricts its identifiers syntactically, whereby we have a question on what to do if an OCaml AST with an invalid identifier. Reasonable alternatives include:

Do nothing, we expect the identifiers to be valid. We could also supply a function that fixes all invalid identifiers, but then a user would have to remember calling it if it might be needed.
Handle the invalid identifiers as though they were valid. In practice this essentially entails mangling names before generating code/pretty printing, pretending the identifiers are valid when evaluating, etc.

We're currently leaning towards the latter, for two reasons:

We're reusing a language fragment that already has a defined invariant for the identifiers: all are allowed. If we want to have a new invariant we should create a new constructor, otherwise it's too easy to misuse it.
A user generating OCaml from something else does not need to care about OCaml's identifier restrictions, all names are allowed and handled correctly.

We discussed the particulars of pretty-printing some during the meeting and the options are essentially these:

In all public functions that do pretty-printing, first do a pass that mangles names, then do normal pretty-printing.
Use a different implementation for IdentifierPrettyPrint that mangles names.

The latter requires that the mangling is injective, while the former does not, since the normal pretty-printing will handle making the names unique.

dlunde commented 3 years ago

Did we ever agree on an approach for the above? I should probably do the same for the C pretty printer.

elegios commented 3 years ago

I think we landed on "we should keep the same invariants", so in this case, we should allow all identifiers in TmVar and handle restrictions in the target language at a later stage (pretty-printing)

miking-lang / miking

Reusing `syn`s with slightly different invariants, or not #221