jneen / parsimmon

A monadic LL(infinity) parser combinator library for javascript
MIT License
1.24k stars 126 forks source link

Question for enhanced ParseError #22

Closed Bartvds closed 10 years ago

Bartvds commented 10 years ago

I've been working on the ParseError as per #19, but have a interesting problem:

Some code:

https://github.com/Bartvds/parsimmon/blob/7cd7d2833ecac312721eaaced47bb04b4c055c15/src/parsimmon.js#L14-L26

Thing is: currently the parsers can return a quoted string as expectation for a string parse (eg: 'abc', or a non-quoted (vanilla) string for 'symbols' like eof etc. This makes it harder to handle the error and make a nice report (doable but a little messy).

I'm not sure how to put that distinction between parse strings and 'symbols' in the expected value of the error.

Any suggestions?

Bartvds commented 10 years ago

This is where it gets quoted: https://github.com/Bartvds/parsimmon/blob/7cd7d2833ecac312721eaaced47bb04b4c055c15/src/parsimmon.js#L252

jneen commented 10 years ago

I am okay with removing the quotes as long as ParseError still has a reasonable ::toString() in which they are quoted.

Bartvds commented 10 years ago

Cool, sounds good. But there is a complication:

If we have text we try to parse for a string EOF but it fails, then this would be the same value as the end-of-file error (also EOF). (same goes for any character, zero alternates etc).

By the time the expected-value arrives in the showError() this distinction is lost which makes quoting hard, and then how would a reporter know the value was an expected string or a symbol?

It'll have to be stored differently in the makeError() data.

We could make the EOL value (and other symbols) a local object (eg: unique reference) and check for that. So when a string or RegExp arrive at the ParseError the are literals, and objects get tested as symbol. We'd add a .name and a toString() etc.

laughinghan commented 10 years ago

@Bartvds I'm confused why this is a problem, this is a feature not a bug, I would think the fact that parsers return a quoted string as expectation for a string parse and a non-quoted string for 'symbols' should makes it easier, not harder, to handle the error and make a nice report. In fact the whole reason it currently does that at all is to make the reported error message nicer, as formatted by parseError(). By design, the contract on expected is that it's a human-readable string description of the expected value.

If the problem is that this description string is too long because the expected string (or, in the case of item 2 of #19, regex) is itself too long, well, truncating it down to a reasonable length is part of making it human-readable, so that should not be handled in parseError() (or showError()), that should be handled in the primitive parser itself (Parsimmon.string() or .regex()), before even being passed to makeFailure(). For example, https://github.com/Bartvds/parsimmon/blob/7cd7d2833ecac312721eaaced47bb04b4c055c15/src/parsimmon.js#L252 could be changed to:

var expected = "'" + str.slice(0, 8) + (str.length > 8 ? '...' : '') + "'";

Note that the code for dealing with regexes that are too long would be totally different; neither belongs in parseError() or showError().

jneen commented 10 years ago

I also plan on adding some sort of tagging system for custom parse errors - "expected an expr", "expected a string literal", etc.

Bartvds commented 10 years ago

@laughinghan I would always prefer to also have the whole expected value available to the using code (as-is).

I don't think presentation logic like trimming and quoting belongs in the parser. It is convenient, of course, but only as an extra. The primitive parser has no business doing end-user presentation logic.

The thing with result object is that you now don't have to limit the data for just single-line string presentation. Maybe we want to throw our own AssertionError (expected/actual), of put the data to TAP or have expandable error messages (or whatever).

@jayferd A tagggin system would be very welcome. RegExp's are fun but not very communicative (these unicode ranges are killing us :)

laughinghan commented 10 years ago

@Bartvds Well, you could have a separate expectedValue that has the whole string or regex value, and it could be undefined for any/all/eof, so you could check if .expected === 'eof' (as opposed to if .expected === "'eof'", in which case .expectedValue === 'eof'). fail() would take in a second argument.

A tagging system is fine and dandy for extensibility like expandable error messages or whatever, but for the common task of human-readable stringifying, that we're even suggesting branching on tags makes this a clear case of where we should be using polymorphism instead of branching on the type of the expected thing, right @jayferd?

Bartvds commented 10 years ago

@laughinghan Yea, we'd definitely need the real expected value (or a symbol) next to the pre-stringified quoted thing.

I don't understand what you mean with "branching on tags".

But as illustration:

For example I use a validation library Joi that feels similar to Parsimmon's style (eg: chainy-api creating immutable objects), and it has nice helpers like tags() and description() that create annotated versions. it is pretty cool, the report is very detailed too.

Bartvds commented 10 years ago

I forgot I noticed a .desc() method landed! Very cool, now I can get rid of the workaround of that ridiculously long utf-letter RegExp message :wink:

I'll close this then.