qwertie / ecsharp

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.
http://ecsharp.net
Other
172 stars 25 forks source link

Proposal: change trivia marker to a single character. #61

Closed qwertie closed 4 years ago

qwertie commented 5 years ago

Originally Loyc trees had a convention that there was a single character # to mark all "special" identifiers. In this system, trivia (non-semantic information such as comments) were attached to things as attributes and required to have a Name that starts with #trivia_, e.g. #trivia_SLComment for a single-line comment.

Later I decided to switch to a different special prefix for operators, an apostrophe (why an apostrophe? To minimize visual noise - I liked that it was a small character. Also, I didn't want to use a punctuation mark that already connoted a specific operator). I didn't want to slow down the check that distinguishes "normal" and "special" names so rather than check if name[0] == '\'' || name[0] == '#', LNode.IsSpecialName checks if name[0] <= '\''. Thus any prefix below ASCII 40 is reserved for special names.

Now I'm thinking that #trivia_ is a bit clunky - why not define a single-character prefix for trivia? At first I was thinking that the space character would be a good prefix, but then I remembered that #trivia_ does have one virtue, it greps well: code referring to #trivia_ is easy to find. With that in mind I think the best prefix is %, because % is a fairly rare character in most code, especially if that code is compiler-related. What do you think @jonathanvdc?

The next question then is whether to keep an alphanumeric representation of trivia concepts, like %SLComment for "single-line comment", or whether to go with a compact symbolic representation like %// - with the understanding that if one is parsing a language where comments are denoted # like this or (* like this *) it would still be recommended to use %// and %/**/ to represent those comments in the Loyc tree.

jonathanvdc commented 5 years ago

Using % for trivia sounds like a great idea! % is more succinct and it's also helpful for determining whether an attribute is "essential" or not. That's especially useful for compilers that accept Loyc trees as input. For example, ecsc currently reports an error whenever it encounters a special name it doesn't understand unless that name starts with #trivia_, in which case the node with the special name is just ignored. That's just quirky at best. Only ignoring nodes with names that start with % seems more elegant somehow. (Even though it is conceptually the same as the current solution.)

With regard to %MLComment vs %/**/, the latter is obviously more terse, but maybe having six leading non-alphanumeric characters in %/**/("This is a comment.") is a little bit over the top.

qwertie commented 5 years ago

Actually there are 10 leading punctuation marks in the proper form, @`%/**/`("This is a comment"), but hey, in LESv3 it's only 9 (`%/**/`("This is a comment"))

I guess we'll go with alphanumeric forms then...

qwertie commented 5 years ago

Sorry for not moving on this faster. CodeSymbols.IsTriviaSymbol and thus IsTrivia(this ILNode) now recognize the % prefix in addition to #trivia. Changing the prefix solution-wide (particularly in CodeSymbols) might break things so I'll wait at least until v2.7.0 (semantic version 27) to do that.

qwertie commented 4 years ago

This is implemented on branch breaking-changes-2.7 (2.7.0.0)