kursjan / petitparser2

A high-performance top-down parser
MIT License
41 stars 19 forks source link

Migrate PetitPreprocessor #58

Open jecisc opened 4 years ago

jecisc commented 4 years ago

In PetitParser 1, a tool named "PetitPreprocessor" was added. The goal is to be able to preprocess the input to remove things such as comments in order to make the grammar to write easier while conserving the right positions in the input.

I think it would be nice to add it to PetitParser2.

I check if it was easy to migrate but it is not straightforward. PetitPreprocessor use the fact that PP1 streams are positionable while they are not in PP2. I don't have the time now to check how to update it to work with PP2.

https://github.com/moosetechnology/PetitParser/tree/development/src/PetitPreprocessor https://github.com/moosetechnology/PetitParser/tree/development/src/PetitPreprocessor-Tests

jecisc commented 4 years ago

If someone with more knowledge on PetitParser2 is willing to help it would be much appreciated :)

kursjan commented 4 years ago

I can have a look into it. Nevertheless, I have to say that PP2Stream is a positionable stream though. What is the issue you had with migrating?

jecisc commented 4 years ago

I mean that the stream itself does not know it’s current position and cannot update it’s current position. Or I missed something maybe.

kursjan commented 4 years ago

https://github.com/kursjan/petitparser2/blob/master/PetitParser2/PP2Stream.class.st

{ #category : #'context interface' }
PP2Stream >> atPosition: position [
    ^ collection at: position
]

You probably just need to expose the position instvar.

kursjan commented 4 years ago

Oh, got it. Position in not instvar of the PP2Stream but PP2Context. I will have to check how the preprocessor works and what does it do...

jecisc commented 4 years ago

I’ll try to explain what I know of it later this evening if I think about it ;)

jecisc commented 4 years ago

From what I know: PetitPreprocessor allows one to preprocess the input in order to remove things to parse. This allows one to make the parser easier to write.

Let's say for example that I want to detect some code duplication in code. Then I don't care about the comments but I don't want to manage them in my parser.

For example in a parser I have:

start
    ^ (controlStructure / comment / water) plus preProcessor: (comment ==> [ :p | '' ])

There are two kinds of preprocessors currently.

A parser stream one acting like this:

testBiggerReplacementThanMatching
    preProcessingParser := 'Troll' asParser preProcessor: 'u' asParser ==> [ :p | 'll' ].
    self assert: (('Un' asParser , preProcessingParser , 'DeTroy' asParser) end matches: 'UnTrouDeTroy')

And a regex one acting like this:

testDecomposedEntryConsumed
    preProcessingParser := 'Libellule' asParser preProcess: 'T' asRegex into: ''.
    self assert: (preProcessingParser , 'yoyo' asParser matches: 'LibTelTluTleyoyo')

About how it works, from what I have seen, it introduces a class PPRelativePositionStream. This stream wrapped a PPStream and knows transformations. Then it will be able to say where we are in the PPStream when we apply the transformations. The transformations being what was changed in the code by the preprocessor.

It also introduces a PPInfo class that return information around the parsing such as the start and stop position. This is useful in Moose for example to create the source anchors representing the position of elements in files.