scala-subscript / subscript

9 stars 2 forks source link

Result Catchers #1

Closed AndreVanDelft closed 8 years ago

AndreVanDelft commented 9 years ago

In an issue on the previous project site I had written:

Some form of the result catchers is working now. A better word is: result propagator. Some rules: Code fragments and script calls may have explicit or implicit result propagation markers. If such a marker is present, then the enclosing script's result (a Try) becomes the Try of the marked >element, upon success or failure (=active deactivation without success).

The "^" suffix is an explicit marker. If a script body has only a single code fragment or call, then that fragment or call gets the implicit >marker (if it did not have an explicit one) If a larger expression than a single code fragment or call has the explicit marker suffix, then all code >fragments and calls inside it get the implicit marker.

I have been looking at FastParse. It appears SubScript can get largely the same expressiveness. A few extras would be needed.

Terms, parentheses and brackets

Let's recapture some rules for terms in a script expression.

A literal or an expression between parentheses or braces evaluates to a value. This is not directly a real script expression operand; often an implicit conversion will be needed. The same holds when there is a caret^ suffix for the term. It is possible though that the value is of type Script[anything]; then the term will denote a script call.

A term that is a path ending in an identifier, plus optionally a parameter list, denotes either a script call, a method call or again a value that may be implicitly converted, or be of type Script[...]. A method call that yields a Script[...] is also considered to be a script call.

In Scala expressions, a term between rectangular brackets is an anonymous script. It must contain a script expression and its type is Script[...].

So what happens when we write ([scriptExpression]) ? This is a script call to the lambda of scriptExpression. Almost identical, but a difference: any result catching within the brackets holds for the lambda. This gives opportunities to do similar things as in FastParse.

Result catching flavours

There was already a result catcher of this form:

term^

Term may be a script call, optionally after an implicit conversion; or a method call, which becomes then a tiny code fragment; or an explicit tiny code fragment. The postfix operator may also follow a bracketed expression, as in:

[a + b]^

This would be shorthand for

[a^ + b^]

We want to be able to set the result to a specific value, say "@". "@"^ is not good for that, since it becomes a script call, after an implicit conversion. So for this purpose, let's invent the syntax

^value

where value is a Scala value: a literal or a simple path expression; or even a method call. This sets the result value, as if it was written

{:value:}^

Up to here ^ just sets the result value. In FastParse, the following would have a (String,String) tuple as result: val capture2 = P( "a".rep.! ~ "b".! ~ End) We could do something similar:

term1^$1 term2^$2

The compiler sees here $1 and $2; it concludes that apparently a tuple should result with components $1 and $2.

A similar notation allows to put the result of a script call etc into a variable, rather than in the script's result. This would be of the form $name. Here name should either be a simple name of an assignable variable (i.e. an instance var, a parameter or a local var). If no such name is known, then the occurrence is also an implicit declaration of such a local var; the scope is TBD.

To put a specific value in such a variable, or in the result tuple, write terms like:

^(x*10)^$1
^"aString"^$aVarName

Map and FlatMap equivalents

FastParse has explicit Map and FlatMap constructs:

val binary = P( ("0" | "1" ).rep.! )
val binaryNum = P( binary.map(Integer.parseInt(_, 2)) )

val leftTag = P( "<" ~ (!">" ~ AnyChar).rep(1).! ~ ">" )
def rightTag(s: String) = P( "</" ~ s.! ~ ">" )
val xml = P( leftTag.flatMap(rightTag) )

In SubScript, flatmap is available in the form of the dataflow operator. We would write:

script..
  leftTag             = "<" anyString^ ">"
  rightTag(s: String) = "</" s ">"
  xml                 = leftTag ~~(tag)~~> rightTag(tag)

or even

  xml                 = leftTag ~~> rightTag

For map we need an entirely new construct, which is closely related to dataflow. Its RHS is not a script, but just a function that operates on an argument. It has the same arrow notation, except that the arrow ends in ^ rather than >. E.g.,

script..
  binary = str1orMore:["0" + "1"]
  binaryNum = binary ~~^ Integer.parseInt(_, 2)

Or

  binaryNum = binary ~~(b)~~^ Integer.parseInt(b, 2)

In principle one could have written

  binaryNum = binary ~~(b)~~> ^Integer.parseInt(b, 2)

but this is more heavy weight: it involves a sequential composition, whereas the upward arrow may be implemented much more efficiently. Like with the normal dataflow, there may be multiple arrows that do pattern matching; there may also be arrows that carry exceptions.

Chains of this mapping operator are also possible: in

x ~~^ y ~~^ z

y and z are functions that map an argument into something else. x is a process, hence x ~~^ y is one, and hence x ~~^ y ~~^ z is one.

To put the final result in a field of a result tuple, write something like:

x ~~^ y ~~^ z ^ $2

Options and Repetitions

In FastParse, optionally parsing is done like:

val captureOpt = P( "a".rep ~ "b".!.? ~ End)
val Result.Success(Some("b"), 4) = captureOpt.parse("aaab")

captureOpt is a Parser[Option[String]] In SubScript something similar would require a more or less standard script

script optionT: Option[T] = ^None . s ~~^ Some^

Then the equivalent to the FastParse code would be:

script.. captureOpt = .."a"; option: ["b"] ^

Repetitions

In FastParse:

val captureRep = P( "a".!.rep ~ "b" ~ End)

captureRep is a Parser[Seq[String]] To have the same kind of functionality in SubScript is requires a "repeating return catcher", written as ^^ rather than just ^. The reason is that we may otherwise get too many elements in the list.

For example, to parse multiple pieces of words (any characters except spaces) separated by spaces, the following would do:

script..
 word = anyChar..
 parseWords = word .. " "

Note that anyChar will have lower priority (implemented at low level), relative to the specific term " ". To capture the result we could naively write:

script..
 word:String = ^""; anyChar~~^($+_)^ ..
 parseWords:Seq[String] = Nil; word~~^($:+_)^ .. " "

What would be the result of parseWords on input "aa bb"? Hint: it is not List("aa","bb"). After the first 'a' input, the call to word would succeed, so parseWords's $0 would become List("a"). Then word, which had not terminated yet, grabs the second 'a' and succeeds again. Then $0 becomes List("a", "aa"), etc. This is not what we want.

We must have something more advanced; a result catcher that adds to the $0 that existed in the previous pass (Nil in case the pass is 0). The "^^" operator will do so.

Now we can make iterating scripts

script..
  zeroOrMore[T](s:Script[T]): Seq[T] = .. s^^
   oneOrMore[T](s:Script[T]): Seq[T] =    s^^ ..

  str0orMore(s:Script[Char])        = zeroOrMore:s ~~^ chars2string 
  str1orMore(s:Script[Char])        =  oneOrMore:s ~~^ chars2string

FasteParse's math parser example operates repetition with separators. In SubScript this could be supported as follows:

 repeat:separatedBy[T,U](body:Script[T],separator:Script[U]): (T,Seq[(U,T)]) =
   body^1 zeroOrMore: [separator^1 body^2] ^2

Example 1: Math Parser

SubScript alternative for FasteParse's math parser example:

number: Int = str1orMore: [for(c<-'0'to'9') + c] ~~^ toInt
parens: Int  = "("  addSub^  ")" 
factor: Int = number + parens

divMul: Int = repeat:factor, separatedBy:['*'+'/'] ~~^ eval
addSub: Int = repeat:divMul, separatedBy:['+'+'-'] ~~^ eval
expr        = addSub^ End

It is possible to write divMul and addSub without the call to repeat:separatedBy. We can do the expansion here. However, this expansion should become a call to a lambda, since otherwise it would make the enclosing scripts return tuples, rather than Ints:

divMul: Int = ([factor^1 zeroOrMore: [ ['*'+'/']^1 factor^2 ] ^2 ]) ~~^ eval

We could even get rid of the call to zeroOrMore:

divMul: Int = ([factor^1  ([ .. (['*'+'/']^1 factor^2 ])^^ ]) ^2 ]) ~~^ eval

It is slightly more complex than the FastParse counterpart:

val divMul: P[Int] = P( factor ~ (CharIn("*/").! ~! factor).rep ).map(eval)

The inline version of divMul has three levels of the compound parentheses/bracket pairs ([...]) that were introduced close to the top of this post. It may look even worse than advanced regex code. I would not recommend doing such things inline, but I find it important that it is possible; a sign that the constructs are powerful enough.

AndreVanDelft commented 9 years ago

I start to think that a simple definition for

 [expr]^

should be

 ([expr])^

This way there is a clear definition, and we do not need all these parentheses any more in the example. We would have:

divMul: Int = [factor^1  [ .. ['*'+'/']^1 factor^2 ]^^ ] ^2 ] ~~^ eval

Another thing:

x ~~(d)~~^ s

could be defined as essentially doing the same as

x ~~(d)~~> ^(s)

The difference would be that the first form allows for a more efficient implementation.

anatoliykmetyuk commented 8 years ago

Same as #3 which is solved.