scala / bug

Scala 2 bug reports only. Please, no questions — proper bug reports only.
https://scala-lang.org
232 stars 21 forks source link

Parsing: Increase modularity of lexical scanners #3723

Closed scabug closed 13 years ago

scabug commented 14 years ago

I've been writing a parser for a language with the aim of automating some code refactoring and rewriting, which means (among other things) I need to preserve comments & whitespace and hang them onto the resulting AST.

I've had success doing this by writing a custom reader (in a trait ContextualScanners extends Scanners { ... }) that exposes some properties about context, and it works great: I can mix it in with e.g. StdLexical and completely avoid reinventing the wheel.

Another example that might require custom scanning is for indent-structured languages: a straightforward approach might be to write a custom scanner that remembers the indentation level of the prior line and emits a psuedo-token for each indent and dedent it encounters.

The problem is that current library design prevents me from mixing and matching the resulting Reader[Token] classes, each with different (isomorphic?) features: A big pain point, for example, is that I can't use my context-exposing scanner in a packrat parser because it too requires a custom reader. [[ n.b.: "can't" might just mean I haven't learned how yet. ]]

For that, it might be enough to just modify PackratReader so it exposes the properties/methods of its underlying reader, and would certainly be more straightforward.

Alternatively, it might be possible to be a bit more ambitious: I'm still grappling with the full sophistication of mixins, but it may be nice to be able to derive e.g. an indent-aware, packrat-memoizing, context-keeping reader via composition.

scabug commented 14 years ago

Imported From: https://issues.scala-lang.org/browse/SI-3723?orig=1 Reporter: Dan Shoutis (shoutis)

scabug commented 14 years ago

@lrytz said: tiark, can you take a look at this one?

scabug commented 13 years ago

@TiarkRompf said: I'm afraid I don't understand what exactly the current implementation does not allow you to do. There are two ways you can extend the scanning capabalities, either by layering individual reader objects or by mixin-composition. Given that you want use a custom scanner for a packrat parser, that would be either new PackratReader(new MyCustomReader) or new PackratReader with MyCustomReader { ... }.

scabug commented 13 years ago

Dan Shoutis (shoutis) said: Thanks for the followup...

An issue with this is that the custom properties that the underlying scanner exposes are inaccessible ('underlying' is private) without some monkey business.

(I could smuggle the custom properties through via custom token types, but IIRC this means that I can no longer reuse pre-existing parsing/lexing components.)

It's been several weeks since I last picked this project up, though, so my thoughts have become rather vague on the issue(s). :)