jneen / parsimmon

A monadic LL(infinity) parser combinator library for javascript
MIT License
1.24k stars 126 forks source link

Negative Lookahead #150

Closed AljoschaMeyer closed 7 years ago

AljoschaMeyer commented 7 years ago

Hi, is there an idiomatic way to write a parser that only matches if it is not followed by another parser?

const parseFoo = Parsimmon.string("foo");
const parseBar = Parsimmon.string("bar");
const parseFooNotFollowedByBar = ?;

parseFoo.then(parseBar).parse("foobar"); // should succeed
parseFooNotFollowedByBar.then(parseBar).parse("foobar"); // should fail

I suspect that lookahead could help here, but the documentation does not really help me:

Parses using parser, but does not consume what it parses. Yields an empty string.

So it does not change the input and always returns the same thing. What is the difference to () => ""?

I do know that the example could easily be rewritten in an LL1 grammar that would be more efficient to parse. I'm just wondering whether parsimmon supports the approach of not matching if something else succeeds.

wavebeem commented 7 years ago

For your simple example I'd say the easiest is just a JS negative lookahead:

/foo(?!bar)/

P.lookahead runs a parser but doesn't change the current parse location, if that makes sense. I think you might be able to do what you want with .chain though?

> P.seq(P.string('foo'), P.all.chain(str => str === 'bar' ? P.fail('no ' + str) : P.of(str))).parse('foobar')
{ status: false,
  index: { offset: 6, line: 1, column: 7 },
  expected: [ 'no bar' ] }

> P.seq(P.string('foo'), P.all.chain(str => str === 'bar' ? P.fail('no ' + str) : P.of(str))).parse('foobaz')
{ status: true, value: [ 'foo', 'baz' ] }
AljoschaMeyer commented 7 years ago

Maybe the library could provide Parsimmon.not(parser), which:

So basically the PEG not-predicate. This might go against the spirit of 'a monadic LL(infinity) parser combinator library', but it is so convenient to have.

wavebeem commented 7 years ago

I'm going to check in with the project owner about the idea. In the mean time, Parsimmon exposes all of its internals, so you can totally implement this locally yourself if you'd like!

wavebeem commented 7 years ago

This is the implementation of lookahead for reference https://github.com/jneen/parsimmon/blob/master/src/parsimmon.js#L504-L518

anko commented 7 years ago

My (rather diverged) fork partser has an except combinator that—

Takes an "allowed" parser and a "forbidden" parser. Returns a parser that matches anything that the allowed parser accepts and which the forbidden parser does not accept.

I think that's a generalisation of the semantics you wanted here. Its implementation only needs a few changes to adapt to Parsimmon. The following works:

var P = Parsimmon = require('parsimmon')
var assert = require('assert')

var except = function (allowed, forbidden) {
  return Parsimmon.Parser(function (stream, i) {
    var forbiddenResult = forbidden._(stream, i)
    if (forbiddenResult.status) {
      return Parsimmon.makeFailure(i, "something that is not '" +
          forbiddenResult.value + "'")
    } else {
      var allowedResult = allowed._(stream, i)
      if (allowedResult.status) {
        return allowedResult
      } else {
        return Parsimmon.makeFailure(i, allowedResult.expected.join(',') +
            ' (except ' + forbiddenResult.expected.join(',') + ')')
      }
    }
  })
}

var nonBLetter = except(P.letter, P.string('b'))
console.log( nonBLetter.parse('a').status )     // => true
console.log( nonBLetter.parse('c').status )     // => true
console.log( nonBLetter.parse('b').status )     // => false
console.log( nonBLetter.parse('b').expected )   // => [ 'something that is not \'b\'' ]
console.log( nonBLetter.parse('/').expected )   // => [ 'a letter (except \'b\')' ]

Maybe that helps?

wavebeem commented 7 years ago

I totally missed your comment @anko. That looks like a good feature to add.

wavebeem commented 7 years ago

I'll add this under the name notFollowedBy to emphasize that it's a lookahead operation.