Open Lev135 opened 1 year ago
This can easily lead to hard-to-debug problems where unrelated and remote parts of a parser will be able to influence each other. Even if no tokens are consumed between a call to adjustNextError
and the following failure, that failure may as well be custom and its modification may be undesirable. Once the state of the parser is modified to perform this "delayed adjustment" there won't be a way to cancel it.
The beauty of this kind of library, or at least the way I wanted to design it, is that desired behaviors are achieved by using combinators, e.g. by sequencing, branching with (<|>)
, or nesting them, without any extra state or spooky actions at a distance. Compare adjustNextError
with its safer cousin (already found in the library):
-- | Specify how to process 'ParseError's that happen inside of this
-- wrapper. This applies to both normal and delayed 'ParseError's.
--
-- As a side-effect of the implementation the inner computation will start
-- with an empty collection of delayed errors and they will be updated and
-- “restored” on the way out of 'region'.
--
-- @since 5.3.0
region ::
MonadParsec e s m =>
-- | How to process 'ParseError's
(ParseError s e -> ParseError s e) ->
-- | The “region” that the processing applies to
m a ->
m a
region
is safer because it only affects "regions" that are lexically enclosed by the combinator.
Yes, maybe this combinator in general isn't quite good. However, the current behavior of L.lineFold doesn't seem good. There are several problems:
pAb :: Parser () -> Parser [String]
pAb sc' = do
a <- L.symbol sc' "a"
b <- L.symbol sc "b"
pure [a, b]
we can not add "c" to it without changing the pAb like
pAb :: Parser () -> Parser () -> Parser [String]
pAb sc' scLast = do
a <- L.symbol sc' "a"
b <- L.symbol scLast "b"
pure [a, b]
and then
pAbc :: Parser () -> Parser () -> Parser [String]
pAbc sc' scLast = do
ab <- pAb sc' sc'
c <- L.symbol scLast "c"
pure $ ab <> [c]
despite this works, it looks very verbose and errorprone (sc'
and scLast
can easily used at wrong places).
some
and many
combinators like with simple non lineFolded code.So I'd like to make some alternative implementation for lineFold
to let the following work properly (without errors if the next line is not indented):
ex_fold :: Parser [String]
ex_fold = L.lineFold hspace space \sc' -> do
a <- L.symbol sc' "a"
b <- L.symbol sc' "b"
c <- L.symbol sc' "c"
pure [a, b, c]
ex_fold2 :: Parser [String]
ex_fold2 = L.lineFold hspace space \sc'' -> do
a <- L.symbol sc' "a"
bs <- some $ L.symbol sc' "b"
cs <- many $ L.symbol sc' "c"
pure $ a : bs <> cs
One of possible ways to do that was through the adjustNextError
combinator. However, I agree that it isn't quite good. I have another idea, based on error hints: we should not fail if we have incorrect indentation on the line fold, but just add the hint to the error message. Something like this:
lineFold' :: (MonadParsec e s m, TraversableStream s) =>
m () -> m () -> (m () -> m a) -> m a
lineFold' sc scn action = do
lvl <- sc *> L.indentLevel
action . void $ sc *> (optional . try) do
st <- getParserState
lvl' <- scn *> L.indentLevel
unless (lvl' > lvl) do
setParserState st
let lbl = unwords [ "line fold continuation"
, "(the next line should be indented more than"
, show $ unPos lvl
, "in this case)"
]
failure Nothing (E.singleton $ Label $ NE.fromList lbl)
note, that eol consuming and indentation checking is optional
. This make a sence: if the indentation of the next line is small, it means only that line fold is ended, but not that something went wrong, so we need not report an error, just not consume eol.
However, error messages with this implemenation will be always like "undexpected end of line, expected ... or line fold continuation", where the first part has no information about actual error. So maybe it would be better to replace it with indentation error. This can be made through the following trick:
data LineFoldErrInfo = LineFoldErrInfo Int Ordering Pos Pos
deriving (Read, Show)
lineFold'' :: (MonadParsec e s m, TraversableStream s) =>
m () -> m () -> (m () -> m a) -> m a
lineFold'' sc scn action = do
lvl <- sc *> L.indentLevel
region procErr $
action . void $ sc *> (optional . try) do
st <- getParserState
lvl' <- scn *> L.indentLevel
unless (lvl' > lvl) do
o <- getOffset
setParserState st
let i = LineFoldErrInfo o GT lvl lvl'
failure Nothing (E.singleton $ Label $ NE.fromList $ show i)
where
procETok (Label lbl) = case readMaybe (NE.toList lbl) of
(Just (LineFoldErrInfo o ord ref act)) ->
Just $ FancyError o $ E.singleton $ ErrorIndentation ord ref act
_ -> Nothing
procETok _ = Nothing
procErr e = case e of
TrivialError _ _ etoks ->
case getFirst $ foldMap (First . procETok) etoks of
(Just e') -> e'
Nothing -> e
_ -> e
this implemenation gives exactly those behavior, I'd like to have. However it doesn't seems good to convert error through Show-Read
, so I'd like to have something more elegant. Maybe this could be partialy solved by saving custom errors in hints in some manner or by adding a possibility to attach a custom hint manually (without error -> hint conversion).
To be honest, none of the above seems to be a proper solution of the problem: they all look like hacks and maybe may lead to unpredictable error behavior in some cases. However, I'd prefer a good front-end appearance with possible bags when doing something very complicated then the current behavior. Anyway, I'm trying to do the best to make it as nice as possible, so I'd be glad to get your opinion on the problem and the ways to solve it
I propose to add a special method in
MonadParsec
:This can be used to improve error messages in some cases. For example, we can define a version of
lineFold
, which allows us to use the same space consumer in all cases, including after the last symbol:and than an example from linefold documentation becomes more elegant:
I think more usecases should be found for those, who use custom errors (for example, we can add a hint with custom information here).
Going to implementation for
ParsecT
, I have the following idea: add aParseError e -> ParseError e
endo intoHints
data type. It doesn't seems to me to be too expensive, but this shoould be benchmarked, of course.I've implemented this in my fork of the repo and here is the comparision with the main branch. Of course it's not the ultimate version. Many questions are open:
I'm opening this issue here to get your opinion about this feature, is it too expensive for implementation, what's are another drawbacks, that I possibly don't see, etc.