miking-lang / miking

Miking - the meta viking: a meta-language system for creating embedded languages
Other
51 stars 31 forks source link

Reusing `syn`s with slightly different invariants, or not #221

Open elegios opened 3 years ago

elegios commented 3 years ago

MExpr has a somewhat unusual handling of identifiers: there are no disallowed identifiers, any unicode string is a valid identifier. As such, the TmVar node in the AST has no restrictions on the identifier (the same is true for identifiers in constructors, lambdas, lets, etc).

When implementing the OCaml subset it is convenient to reuse these nodes since the semantics are otherwise essentially the same. However, OCaml restricts its identifiers syntactically, whereby we have a question on what to do if an OCaml AST with an invalid identifier. Reasonable alternatives include:

  1. Do nothing, we expect the identifiers to be valid. We could also supply a function that fixes all invalid identifiers, but then a user would have to remember calling it if it might be needed.
  2. Handle the invalid identifiers as though they were valid. In practice this essentially entails mangling names before generating code/pretty printing, pretending the identifiers are valid when evaluating, etc.

We're currently leaning towards the latter, for two reasons:


We discussed the particulars of pretty-printing some during the meeting and the options are essentially these:

The latter requires that the mangling is injective, while the former does not, since the normal pretty-printing will handle making the names unique.

dlunde commented 3 years ago

Did we ever agree on an approach for the above? I should probably do the same for the C pretty printer.

elegios commented 3 years ago

I think we landed on "we should keep the same invariants", so in this case, we should allow all identifiers in TmVar and handle restrictions in the target language at a later stage (pretty-printing)