qwertie / ecsharp

Home of LoycCore, the LES language of Loyc trees, the Enhanced C# parser, the LeMP macro preprocessor, and the LLLPG parser generator.
http://ecsharp.net
Other
172 stars 25 forks source link

Loyc trees: support custom literals natively #110

Closed qwertie closed 3 years ago

qwertie commented 4 years ago

In 2016 I added the SpecialLiteral class (now called CustomLiteral) as part of the new LES3 language, but I realize now that the problem it tries to solve is more fundamental and deserves explicit support in Loyc trees themselves.

When anything is serialized, it will naturally become either a sequence of characters or a sequence of bytes. Without loss of generality (but perhaps with loss of efficiency) it is possible to store any sequence of bytes as a string (e.g. using BAIS format, or by treating the bytes as UTF-8 with some provision for invalid UTF-8 - my LES3 code supports round-tripping invalid UTF-8 bytes as invalid UTF-16 code units in the range 0xDC80..0xDCFF.)

With this in mind, all literals can be thought of as a pair of two strings: first a type marker to indicate the data type, and then a string which is some kind of serialized representation of a value. In LES3, a literal with type marker tm and data string C:\Temp is written tm"C:\\Temp". Ordinary literals like 123, "Hello, World!" and true have an implicit type marker, which is defined in LES3 as "" for numbers, "" (the empty string) for strings and "bool" for booleans. Thus, we can write these examples equivalently as `"123", ``` ``"Hello, World!" ``` andbool"true"` respectively.

Loyc trees have traditionally had only a Value (an object, not a string). I propose storing adding two more properties to every node, for a total of three:

  1. Value: The true value of the literal (any object).
  2. TextValue: A string representation of the literal.
  3. TypeMarker: A global Symbol (singleton string) indicating the literal type.

By having three properties, a literal node can keep track of either or both the meaning of a literal (the Value) and the original text from which it was parsed.

Some things to consider:

qwertie commented 3 years ago

July 3, 2020: As of version 2.8.0 (semver 28) ILNode and LNode include TextValue and TypeMarker properties (ISerializedLiteral interface, which is part of the three-property combo of ILiteralValue: TextValue, TypeMarker and Value.)

Dec 10, 2020: I committed LiteralHandlerTable and StandardLiteralHandlers classes with a set of standard parsers and printers. There is a standard set of types and type markers, each of which has a standard syntax as described in the documentation of StandardLiteralHandlers. These parsers and printers will be used by LES3 in version 2.9.0 (semver 29), which is not yet ready for release.