Closed qwertie closed 3 years ago
July 3, 2020: As of version 2.8.0 (semver 28) ILNode
and LNode
include TextValue
and TypeMarker
properties (ISerializedLiteral
interface, which is part of the three-property combo of ILiteralValue
: TextValue
, TypeMarker
and Value
.)
Dec 10, 2020: I committed LiteralHandlerTable
and StandardLiteralHandlers
classes with a set of standard parsers and printers. There is a standard set of types and type markers, each of which has a standard syntax as described in the documentation of StandardLiteralHandlers
. These parsers and printers will be used by LES3 in version 2.9.0 (semver 29), which is not yet ready for release.
In 2016 I added the
SpecialLiteral
class (now calledCustomLiteral
) as part of the new LES3 language, but I realize now that the problem it tries to solve is more fundamental and deserves explicit support in Loyc trees themselves.When anything is serialized, it will naturally become either a sequence of characters or a sequence of bytes. Without loss of generality (but perhaps with loss of efficiency) it is possible to store any sequence of bytes as a string (e.g. using BAIS format, or by treating the bytes as UTF-8 with some provision for invalid UTF-8 - my LES3 code supports round-tripping invalid UTF-8 bytes as invalid UTF-16 code units in the range 0xDC80..0xDCFF.)
With this in mind, all literals can be thought of as a pair of two strings: first a type marker to indicate the data type, and then a string which is some kind of serialized representation of a value. In LES3, a literal with type marker
tm
and data stringC:\Temp
is writtentm"C:\\Temp"
. Ordinary literals like123
,"Hello, World!"
andtrue
have an implicit type marker, which is defined in LES3 as "" for numbers, "" (the empty string) for strings and "bool" for booleans. Thus, we can write these examples equivalently as `"123", ``` ``"Hello, World!" ``` and
bool"true"` respectively.Loyc trees have traditionally had only a
Value
(an object, not a string). I propose storing adding two more properties to every node, for a total of three:Value
: The true value of the literal (any object).TextValue
: A string representation of the literal.TypeMarker
: A globalSymbol
(singleton string) indicating the literal type.By having three properties, a literal node can keep track of either or both the meaning of a literal (the
Value
) and the original text from which it was parsed.TextValue
orTypeMarker
will still be common; a literal created programmatically will not (generally) have them.Value
will occur when parsing code and there is no parser available to interpret the literal (e.g. because the type marker is unknown), or if there was an error interpreting the literal, or if parsing is disabled in the lexer.Some things to consider:
987654321.0123456789
is too precise to store accurately as adouble
but it should be possible to save the original text, if the text is valid in the destination language. Similarly it is attractive for a printer to be able to write out0b0110_110_00011
in exactly the same way it was written.z
always means "integer of unlimited size", but usually it is not possible to change the syntax of a given language. For example, I can write123'456
in LES3, but this is not a legal C# literal. Therefore, the safest thing for the C# printer to do is to print based on theValue
alone and ignore theTextValue
andTypeMarker
. But if all printers do this, the extraTextValue
andTypeMarker
properties won't be adding much value; I am assuming some printers will be smart enough to use these properties sometimes, but I'm unsure at this point what rules they should follow.LNode
could parse the token on-demand when theValue
property is called, so that parsing doesn't happen unnecessarily. This could be accomplished by storing a reference to a (singleton) parser instead of aValue
before parsing occurs. However, most users will want to have a complete list of errors after parsing a file, so it seems like the default behavior must be to parse greedily, except when the lexer can verify that the value is legal without figuring out its value.Value
should probably returnTextValue
.Value ≡ TextValue
is only expected when the value is "supposed" to be a string.