Lexical syntax of names

BinderDavid commented 4 months ago

Previously we just had one single declaration in lang/parser/src/cst/exp.rs.

pub type Ident = String;

This definition was used for variables, constructors, destructors, type constructors, etc. This is no longer tenable if we want to introduce modules and qualified names. We need more structure.

I propose something like the following:

-- Names for variables: "x", "k"
<VarName>  := <lowercase> (<lowercase> | <uppercase>) * 
-- Names for uppercase identifiers: "Cons", "List", "Stream"
<UIdName> := <uppercase> (<lowercase> | <uppercase>)*
-- Names for lowercase identifiers: "ap", "hd", "tl"
<LIdName> :=  <lowercase> (<lowercase> | <uppercase>)*
-- Names for modules:  "control", "std", "function"
<MName> := <lowercase> (<lowercase> | <uppercase>)
-- Names for qualified uppercase identifiers:  "Cons", "std::List", "std::List::Cons"
<QUIdName> := (<MName>"::")*(<UIdName>"::")?<UIdName>
-- Names for qualified lowercase identifiers:  "std::Fun::ap" "Fun::ap"
<QLIdName> :=  (<MName>"::")*(<UIdName>"::")?<LIdName>

I propose that for typeconstructors, constructors and codefinitions we may only use UIdName, and for destructors and and definitions only LIdName. This would also allow to distinguish variables from calls with no arguments in the parser, i.e. x vs Nil.

timsueberkrueb commented 4 months ago

Regarding syntax: Following the Rust conventions, I would advocate for allowing underscores everywhere, and encourage (but maybe not enforce) snake_case over camelCase for definitions and destructors. Regarding implementation: For better error messages, it is probably a good idea to keep the parser as permissive as possible and only distinguish these syntactic categories during lowering/name resolution?

BinderDavid commented 4 months ago

I agree that the parser should be permissive in general, but there are a few things that we should distinguish early in order to make our life easier later on. For example, I don't want the parser to allow qualified names in binding positions of both names and variables. E.g. I would expect the parser to reject:

data foo::Bar { bizz::Fizz( foo::x : Type,...) }
     ^^^^^^^^   ^^^^^^^^^^  ^^^^^^
          (a)      (b)        (c)

Here (a) and (b) are binding positions for names, and (c) is a binding position for a variable. The parser should reject them all. In those positions we only want to allow unqualified identifiers.

The other thing is whether we want to enforce all module names to be lower case and all type names to be upper case. This would allow us to distinguish already in the parser whether a name is qualified by a type vs a module. Compare:

std::result::Result (Here, std and result must be module names.)
std::result::Result::Ok (Here, std and result must be module names, and Result a type name)
std::function::Fun::ap (Here std and function are module names, Fun is a type and ap a destructor.
std::function::compose (Here std and function are module names, compose is a toplevel definition)

timsueberkrueb commented 4 months ago

Yeah we should enforce lowercase module names but it may be easier to allow for multiple error messages in lowering/name resolution than making parse errors recoverable. It would certainly be more fine-grained.

polarity-lang / polarity

Lexical syntax of names #201