Closed AndreVanDelft closed 8 years ago
I start to think that a simple definition for
[expr]^
should be
([expr])^
This way there is a clear definition, and we do not need all these parentheses any more in the example. We would have:
divMul: Int = [factor^1 [ .. ['*'+'/']^1 factor^2 ]^^ ] ^2 ] ~~^ eval
Another thing:
x ~~(d)~~^ s
could be defined as essentially doing the same as
x ~~(d)~~> ^(s)
The difference would be that the first form allows for a more efficient implementation.
Same as #3 which is solved.
In an issue on the previous project site I had written:
I have been looking at FastParse. It appears SubScript can get largely the same expressiveness. A few extras would be needed.
Terms, parentheses and brackets
Let's recapture some rules for terms in a script expression.
A literal or an expression between parentheses or braces evaluates to a value. This is not directly a real script expression operand; often an implicit conversion will be needed. The same holds when there is a caret^ suffix for the term. It is possible though that the value is of type
Script[anything]
; then the term will denote a script call.A term that is a path ending in an identifier, plus optionally a parameter list, denotes either a script call, a method call or again a value that may be implicitly converted, or be of type
Script[...]
. A method call that yields aScript[...]
is also considered to be a script call.In Scala expressions, a term between rectangular brackets is an anonymous script. It must contain a script expression and its type is
Script[...]
.So what happens when we write
([scriptExpression])
? This is a script call to the lambda ofscriptExpression
. Almost identical, but a difference: any result catching within the brackets holds for the lambda. This gives opportunities to do similar things as in FastParse.Result catching flavours
There was already a result catcher of this form:
Term may be a script call, optionally after an implicit conversion; or a method call, which becomes then a tiny code fragment; or an explicit tiny code fragment. The postfix operator may also follow a bracketed expression, as in:
This would be shorthand for
We want to be able to set the result to a specific value, say "@". "@"^ is not good for that, since it becomes a script call, after an implicit conversion. So for this purpose, let's invent the syntax
where value is a Scala value: a literal or a simple path expression; or even a method call. This sets the result value, as if it was written
Up to here
^
just sets the result value. In FastParse, the following would have a(String,String)
tuple as result: val capture2 = P( "a".rep.! ~ "b".! ~ End) We could do something similar:The compiler sees here
$1
and$2
; it concludes that apparently a tuple should result with components$1
and$2
.A similar notation allows to put the result of a script call etc into a variable, rather than in the script's result. This would be of the form
$name
. Herename
should either be a simple name of an assignable variable (i.e. an instance var, a parameter or a local var). If no such name is known, then the occurrence is also an implicit declaration of such a local var; the scope is TBD.To put a specific value in such a variable, or in the result tuple, write terms like:
Map and FlatMap equivalents
FastParse has explicit Map and FlatMap constructs:
In SubScript, flatmap is available in the form of the dataflow operator. We would write:
or even
For map we need an entirely new construct, which is closely related to dataflow. Its RHS is not a script, but just a function that operates on an argument. It has the same arrow notation, except that the arrow ends in
^
rather than>
. E.g.,Or
In principle one could have written
but this is more heavy weight: it involves a sequential composition, whereas the upward arrow may be implemented much more efficiently. Like with the normal dataflow, there may be multiple arrows that do pattern matching; there may also be arrows that carry exceptions.
Chains of this mapping operator are also possible: in
y
andz
are functions that map an argument into something else.x
is a process, hencex ~~^ y
is one, and hencex ~~^ y ~~^ z
is one.To put the final result in a field of a result tuple, write something like:
Options and Repetitions
In FastParse, optionally parsing is done like:
captureOpt is a
Parser[Option[String]]
In SubScript something similar would require a more or less standard scriptscript optionT: Option[T] = ^None . s ~~^ Some^
Then the equivalent to the FastParse code would be:
script.. captureOpt = .."a"; option: ["b"] ^
Repetitions
In FastParse:
captureRep is a
Parser[Seq[String]]
To have the same kind of functionality in SubScript is requires a "repeating return catcher", written as^^
rather than just^
. The reason is that we may otherwise get too many elements in the list.For example, to parse multiple pieces of words (any characters except spaces) separated by spaces, the following would do:
Note that anyChar will have lower priority (implemented at low level), relative to the specific term " ". To capture the result we could naively write:
What would be the result of parseWords on input "aa bb"? Hint: it is not List("aa","bb"). After the first 'a' input, the call to
word
would succeed, so parseWords's $0 would become List("a"). Thenword
, which had not terminated yet, grabs the second 'a' and succeeds again. Then $0 becomes List("a", "aa"), etc. This is not what we want.We must have something more advanced; a result catcher that adds to the $0 that existed in the previous pass (Nil in case the pass is 0). The "^^" operator will do so.
Now we can make iterating scripts
FasteParse's math parser example operates repetition with separators. In SubScript this could be supported as follows:
Example 1: Math Parser
SubScript alternative for FasteParse's math parser example:
It is possible to write
divMul
andaddSub
without the call torepeat:separatedBy
. We can do the expansion here. However, this expansion should become a call to a lambda, since otherwise it would make the enclosing scripts return tuples, rather than Ints:We could even get rid of the call to
zeroOrMore
:It is slightly more complex than the FastParse counterpart:
The inline version of
divMul
has three levels of the compound parentheses/bracket pairs([
...])
that were introduced close to the top of this post. It may look even worse than advanced regex code. I would not recommend doing such things inline, but I find it important that it is possible; a sign that the constructs are powerful enough.