ptal / oak

A typed parser generator embedded in Rust code for Parsing Expression Grammars
Apache License 2.0
142 stars 14 forks source link

Direct access to the underlying data stream #67

Closed ptal closed 8 years ago

ptal commented 9 years ago

The expression ["0-9"]+ has type Vec<char> but semantic actions are sometimes more efficient if they directly work on the underlying str slice, in this case, building a vector is unnecessary.

Here three possible solutions:

  1. Add a type annotation e -> [..] to indicate that we are not interested by the value but only by the slice read.
  2. Add a semantic action operator such as e |> f to indicate the same thing.
  3. Automatically infer what is the good type of e.

This is related to issue #13. Indeed, the slice can be computed with the span and the raw stream. It also means that some semantic actions are interested by the value produced and the span (or slice). We maybe need a more powerful abstraction encompassing location (span), semantic values and raw stream. This is related to #59.

Rational for prefering solution (1) is that we give a new type to an expression, which is a type representing a slice of the underlying data. It means that we can call the recognizer function instead of the parser. It is e -> (), plus another treatment raising up the underlying slice. However, we said that we may want to add a span to semantic values and, for this, I am not sure that -> is suited. There are 4 different information:

  1. Underlying stream representing the whole raw data parsed.
  2. A slice of the underlying stream representing which part of the stream a specific expression has consumed.
  3. The position of the slice in the stream to get global location of the data read.
  4. The data itself which is a semantic conversion of (2).

A use-case for (1) is to call an external parser in a semantic actions.

ptal commented 8 years ago

If we observe the Rust AST, every node containing useful data also have a companion span. It probably means that the type of the expression gives information on whether we should infer the span or not: invisible type should not have span. Also, (2) and (3) are actually the same information if we propose a type StreamRange. If the user doesn't want the span, he can discard it in the semantic action. We could also add an attribute no_span. It closes #13.

ptal commented 8 years ago

A proposed design is described in #85 #86 #87. It closes this issue.