Incorrectly set furthest value in lookahead?

tstodter commented 4 years ago

First, thank you for your work on this library. It's been a real pleasure to work with the last few days and really just astonishingly easy. However, I've hit a small hiccup I was hoping to get some help with.

I'm trying to define a combinator to parse the same input with two different parsers, failing if either fails, succeeding only if both succeed, and keeping only the second result. A && for parsers, basically (apologies if this already exists in the API and I just missed it).

I had hoped to implement this as const both = (parser1, parser2) => P.lookahead(parser1).chain(() => parser2) or just P.lookahead(parser1).then(parser2), so I could use it something like

const FileParser = both(
  P.regex(/[^\n]/).times(94).skip(P.end).many()
    .desc('all lines of 94 characters'),
  <long involved parser that seems to work fine alone>
);

This works just fine if both succeed or the first fails, but they eat the details of any failure produced by parser2. If parser1 succeeds and parser2 fails, parser produced by both will fail, but the final result will have the expected array of the first parser and it loses the much-more-informative error of the second parser.

On investigation, I found that mergeReplies was preferring parser1's result data because the parser produced by lookahead(x) keeps whatever furthest value was generated by x. Still trying to grok the library internals, but either of the following get me the behavior I'm looking for

const both = (parser1, parser2) => (
  P.Parser((input, i) => {
    const result1 = parser1.parse(input);
    return !result1.status
      ? P.makeFailure(result1.index.offset, result1.expected)
      : P.makeSuccess(i, '');
  })
  .then(parser2)
);

// in parsimmon.js
function lookahead(x) {
  if (isParser(x)) {
    return Parsimmon(function(input, i) {
      var result = x._(input, i);
      result.index = i;
      result.value = "";
      result.furthest = -1; // Adding this here
      return result;
    });
  } else if (typeof x === "string") {
......

So a couple questions. Am I understanding the internals or API here correctly? If so, should lookahead be setting furthest manually to -1, or be returning a makeSuccess()?

anko commented 4 years ago

(I'm not the maintainer, just some person who digs combinatory parsing libraries.)

I had hoped to implement this as const both = (parser1, parser2) => P.lookahead(parser1).chain(() => parser2) or just P.lookahead(parser1).then(parser2), […]

Trying a minimal example:

const P = require('parsimmon')
const both = (parser1, parser2) => P.lookahead(parser1).then(parser2)

const a = P.regex(/[aA]/).desc('a or A')
const caps = P.regex(/[A-Z]/).desc('capital letter A-Z')

const aCaps = both(a, caps)

console.log(aCaps.parse('A')) // both should succeed
console.log(aCaps.parse('b')) // a should fail
console.log(aCaps.parse('a')) // caps should fail

Output:

{ status: true, value: 'A' }
{
  status: false,
  index: { offset: 0, line: 1, column: 1 },
  expected: [ 'a or A' ]
}
{
  status: false,
  index: { offset: 0, line: 1, column: 1 },
  expected: [ 'capital letter A-Z' ]
}

In each failure, the expected array corresponds to the failing parser. This seems to not match your observation though:

If parser1 succeeds and parser2 fails, parser produced by both will fail, but the final result will have the expected array of the first parser and it loses the much-more-informative error of the second parser.

Does my example match what you're trying to do in principle? If so, I think there must be something else going on.

tstodter commented 4 years ago

Hey @anko, thanks for thinking on this. The following is closer to what I'm actually working with, and does seem to reproduce what I'm seeing:

const both = (parser1, parser2) => P.lookahead(parser1).then(parser2)

const par = both(
  P.regex(/[a-z0-9]/).times(2).skip(P.string(',')).many().skip(P.eof),
  P.seq(
    P.letters.skip(P.string(',')),
    P.digits.skip(P.string(',')),
    P.letters.skip(P.string(',')),
    P.eof
  )
);

console.log(par.parse('ab,12,cde,'));  // Should fail only the first parser in both()
console.log(par.parse('ab,12,c2,'));  // Should fail only the second parser in both()

That spits out

{
  status: false,
  index: { offset: 8, line: 1, column: 9 },
  expected: [ "','" ]
}
{
  status: false,
  index: { offset: 9, line: 1, column: 10 },
  expected: [ '/[a-z0-9]/' ]
}

If I just run that second parser (starting with P.seq) on 'ab,12,c2,', the result is

{
  status: false,
  index: { offset: 7, line: 1, column: 8 },
  expected: [ "','" ]
}

which is what I would expect. The expected: [ '/[a-z0-9]/' ] above is surprising to me, since that string should be passing the first parser but that expected is clearly coming from the first parser. Kind of shooting in the dark, but could some backtracking after the failure of parser2 be producing the different results?