ollef / Earley

Parsing all context-free grammars using Earley's algorithm in Haskell.
BSD 3-Clause "New" or "Revised" License
361 stars 24 forks source link

Add the 'eof' operator #38

Closed int-index closed 5 years ago

int-index commented 5 years ago

closes #37

int-index commented 5 years ago

I would have expected Eof to be similar to Pure. Did you try this?

Yes, that was the first thing I tried. As you mentioned, <*> turned out to be the stumbling block.

if you do eof' <- rule eof and use eof' in place of eof in the new tests they fail.

Oh my, indeed. Thanks for catching that!

The current implementation of prodNulls and removeNulls mean that eofs in rules don't work as expected.

It was a shot in the dark with these two functions, I still don't fully understand how the algorithm works. What are these nulls? I'll try to figure it out, but not today.

phadej commented 5 years ago

Eof should look a not like Terminal. Instead of some alphabet we now work with an alphabet amended with an "end-marker"(or infinitely many)

I haven't checked the tests, but there should be eof *> eof *> eof which IMHO should match an empty string.

Sent from my iPhone

On 30 Oct 2018, at 23.27, Vladislav Zavialov notifications@github.com wrote:

I would have expected Eof to be similar to Pure. Did you try this?

Yes, that was the first thing I tried. As you mentioned, <*> turned out to be the stumbling block.

if you do eof' <- rule eof and use eof' in place of eof in the new tests they fail.

Oh my, indeed. Thanks for catching that!

The current implementation of prodNulls and removeNulls mean that eofs in rules don't work as expected.

It was a shot in the dark with these two functions, I still don't fully understand how the algorithm works. What are these nulls? I'll try to figure it out, but not today.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

int-index commented 5 years ago

but there should be eof *> eof *> eof which IMHO should match an empty string.

@phadej It does, that's the only reason to have this feature at all. eof can be matched infinitely many times, otherwise it would be equivalent to extending the set of tokens with a special marker and appending it at the end of input.

Eof should look a not like Terminal.

In what sense and why?

phadej commented 5 years ago

Autocorrect, I meant "a lot"

Sent from my iPhone

On 31 Oct 2018, at 0.30, Vladislav Zavialov notifications@github.com wrote:

but there should be eof > eof > eof which IMHO should match an empty string.

@phadej It does, that's the only reason to have this feature at all. Eof can be matched infinitely many times, otherwise it would be equivalent to extending the set of tokens with a special marker and appending it at the end of input.

Eof should look a not like Terminal.

In what sense and why?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ollef commented 5 years ago

prodNulls is supposed to return all the ways that a production can be parsed without consuming any input. This might be more complicated with EOFs though... removeNulls is the opposite, but conservative (so it doesn't matter as much), i.e. it should remove parts of a production that can only be parsed without consuming anything.

Building on what @phadej's saying, perhaps it's actually possible to achieve eof without special support for it by augmenting your input token string with infinitely many EOF tokens at the end.

int-index commented 5 years ago

by augmenting your input token string with infinitely many EOF tokens at the end

Perhaps I'm missing something, but an Earley parser can never succeed on an infinite input, as it tries to consume it entirely.

ollef commented 5 years ago

@int-index: It should stop if it's not able to continue parsing even if it hasn't consumed all of the input, which should be the case with infinitely many EOFs at the end. We'd have to use allParses instead of fullParses since fullParses only returns results that have consumed all of the input. And to make sure that you only get full parses (not including the infinitely many EOFs at the end), add an extra eof terminal at the very end of your full grammar.

If we can get this to work it might be a good example to add to the library.

int-index commented 5 years ago

I'm not enthralled that the resulting grammar will be unusable with fullParses, but it does sound like something that should work. It would also make the generator less useful, as it would generate inputs that have varying amounts of the Eof marker in the end. I'd rather have special support, it seems to have significant advantages:

but if I can't figure it out, then I will fallback to the allParses + Eof marker solution.

int-index commented 5 years ago

I don't think I have time to finish this, I didn't anticipate the eof' <- rule eof issue and I don't know how to address it.

ollef commented 5 years ago

No worries! I might have a go at making an example using the current functionality at some point if no one else beats me to it.

ollef commented 5 years ago

I did that in https://github.com/ollef/Earley/pull/40, so I'll close this now. Cheers!