Method for adjusting subsequent parsers' errors

Lev135 commented 1 year ago

I propose to add a special method in MonadParsec:

-- | Adjust the next `ParseError`, if it occur __before any tokens will be__
-- __consumed__. This can be used to add custom hint to the error or even
-- replace it.
adjustNextError :: (ParseError s e -> ParseError s e) -> m ()

This can be used to improve error messages in some cases. For example, we can define a version of lineFold, which allows us to use the same space consumer in all cases, including after the last symbol:

lineFold ::
  (TraversableStream s, MonadParsec e s m) =>
  -- | Line space consumer (should *not* consume end of lines)
  m () ->
  -- | Line space and eols consumer
  m () ->
  -- | Callback that uses provided space-consumer
  (m () -> m a) ->
  m a
lineFold sc scn action = do
  lvl <- sc *> indentLevel
  action $ sc *> do
    st <- getParserState
    lvl' <- scn *> indentLevel
    unless (lvl' > lvl) $ do
      o <- getOffset
      setParserState st
      let e = FancyError o $ E.singleton $ ErrorIndentation GT lvl lvl'
      changeNextError (const e)

and than an example from linefold documentation becomes more elegant:

sc = L.space (void spaceChar) empty empty

myFold = L.lineFold sc $ \sc' -> do
  L.symbol sc' "foo"
  L.symbol sc' "bar"
  L.symbol sc' "baz" -- we do not need special consumer here

I think more usecases should be found for those, who use custom errors (for example, we can add a hint with custom information here).

Going to implementation for ParsecT, I have the following idea: add a ParseError e -> ParseError e endo into Hints data type. It doesn't seems to me to be too expensive, but this shoould be benchmarked, of course.

I've implemented this in my fork of the repo and here is the comparision with the main branch. Of course it's not the ultimate version. Many questions are open:

[ ] all changes should be benchmarked and maybe something changed to preserve efficiency
[ ] how this should interact with delayed errors?
[ ] should we track an error endo pos to remove it when backtracing (as well as other hints)?
[ ] should we compose two endos or just take the last/first one, if compose than in which order?

I'm opening this issue here to get your opinion about this feature, is it too expensive for implementation, what's are another drawbacks, that I possibly don't see, etc.

mrkkrp commented 1 year ago

This can easily lead to hard-to-debug problems where unrelated and remote parts of a parser will be able to influence each other. Even if no tokens are consumed between a call to adjustNextError and the following failure, that failure may as well be custom and its modification may be undesirable. Once the state of the parser is modified to perform this "delayed adjustment" there won't be a way to cancel it.

The beauty of this kind of library, or at least the way I wanted to design it, is that desired behaviors are achieved by using combinators, e.g. by sequencing, branching with (<|>), or nesting them, without any extra state or spooky actions at a distance. Compare adjustNextError with its safer cousin (already found in the library):

-- | Specify how to process 'ParseError's that happen inside of this
-- wrapper. This applies to both normal and delayed 'ParseError's.
--
-- As a side-effect of the implementation the inner computation will start
-- with an empty collection of delayed errors and they will be updated and
-- “restored” on the way out of 'region'.
--
-- @since 5.3.0
region ::
  MonadParsec e s m =>
  -- | How to process 'ParseError's
  (ParseError s e -> ParseError s e) ->
  -- | The “region” that the processing applies to
  m a ->
  m a

region is safer because it only affects "regions" that are lexically enclosed by the combinator.

Lev135 commented 1 year ago

Yes, maybe this combinator in general isn't quite good. However, the current behavior of L.lineFold doesn't seem good. There are several problems:

You need keep in mind, where is the last item of the fold to use non-modified space consumer for it

You can not combine parsers properly. For example, if we have parser

pAb :: Parser () -> Parser [String]
pAb sc' = do
a <- L.symbol sc' "a"
b <- L.symbol sc "b"
pure [a, b]

we can not add "c" to it without changing the pAb like

pAb :: Parser () -> Parser () -> Parser [String]
pAb sc' scLast = do
a <- L.symbol sc' "a"
b <- L.symbol scLast "b"
pure [a, b]

and then

pAbc :: Parser () -> Parser () -> Parser [String]
pAbc sc' scLast = do
ab <- pAb sc' sc'
c <- L.symbol scLast "c"
pure $ ab <> [c]

despite this works, it looks very verbose and errorprone (sc' and scLast can easily used at wrong places).

We can not use some and many combinators like with simple non lineFolded code.

So I'd like to make some alternative implementation for lineFold to let the following work properly (without errors if the next line is not indented):

ex_fold :: Parser [String]
ex_fold = L.lineFold hspace space \sc' -> do
  a <- L.symbol sc' "a"
  b <- L.symbol sc' "b"
  c <- L.symbol sc' "c"
  pure [a, b, c]

ex_fold2 :: Parser [String]
ex_fold2 = L.lineFold hspace space \sc'' -> do
  a <- L.symbol sc' "a"
  bs <- some $ L.symbol sc' "b"
  cs <- many $ L.symbol sc' "c"
  pure $ a : bs <> cs

One of possible ways to do that was through the adjustNextError combinator. However, I agree that it isn't quite good. I have another idea, based on error hints: we should not fail if we have incorrect indentation on the line fold, but just add the hint to the error message. Something like this:

lineFold' :: (MonadParsec e s m, TraversableStream s) =>
  m () -> m () -> (m () -> m a) -> m a
lineFold' sc scn action = do
  lvl <- sc *> L.indentLevel
  action . void $ sc *> (optional . try) do
    st <- getParserState
    lvl' <- scn *> L.indentLevel
    unless (lvl' > lvl) do
      setParserState st
      let lbl = unwords [ "line fold continuation"
                        , "(the next line should be indented more than"
                        , show $ unPos lvl
                        , "in this case)"
                        ]
      failure Nothing (E.singleton $ Label $ NE.fromList lbl)

note, that eol consuming and indentation checking is optional. This make a sence: if the indentation of the next line is small, it means only that line fold is ended, but not that something went wrong, so we need not report an error, just not consume eol.

However, error messages with this implemenation will be always like "undexpected end of line, expected ... or line fold continuation", where the first part has no information about actual error. So maybe it would be better to replace it with indentation error. This can be made through the following trick:

data LineFoldErrInfo = LineFoldErrInfo Int Ordering Pos Pos
  deriving (Read, Show)

lineFold'' :: (MonadParsec e s m, TraversableStream s) =>
  m () -> m () -> (m () -> m a) -> m a
lineFold'' sc scn action = do
  lvl <- sc *> L.indentLevel
  region procErr $
    action . void $ sc *> (optional . try) do
      st <- getParserState
      lvl' <- scn *> L.indentLevel
      unless (lvl' > lvl) do
        o <- getOffset
        setParserState st
        let i = LineFoldErrInfo o GT lvl lvl'
        failure Nothing (E.singleton $ Label $ NE.fromList $ show i)
  where
    procETok (Label lbl) = case readMaybe (NE.toList lbl) of
      (Just (LineFoldErrInfo o ord ref act)) ->
          Just $ FancyError o $ E.singleton $ ErrorIndentation ord ref act
      _ -> Nothing
    procETok _ = Nothing
    procErr e = case e of
          TrivialError _ _ etoks ->
            case getFirst $ foldMap (First . procETok) etoks of
              (Just e') -> e'
              Nothing   -> e
          _                      -> e

this implemenation gives exactly those behavior, I'd like to have. However it doesn't seems good to convert error through Show-Read, so I'd like to have something more elegant. Maybe this could be partialy solved by saving custom errors in hints in some manner or by adding a possibility to attach a custom hint manually (without error -> hint conversion).

To be honest, none of the above seems to be a proper solution of the problem: they all look like hacks and maybe may lead to unpredictable error behavior in some cases. However, I'd prefer a good front-end appearance with possible bags when doing something very complicated then the current behavior. Anyway, I'm trying to do the best to make it as nice as possible, so I'd be glad to get your opinion on the problem and the ways to solve it

mrkkrp / megaparsec

Method for adjusting subsequent parsers' errors #493