Closed neongreen closed 9 years ago
@neongreen, You have proposed many useful things, without these propositions Megaparsec would be different. I consider it significant contribution to the project. I would like to include you into the list of contributors, should I include you as "Artyom"?
@mrkkrp, yep (and thanks). Just “Artyom” would be ambiguous, but since we're on Github anyway, you could just add a link to my Github nick:
Artyom (@neongreen)
@neongreen, Can you point out where (<|>)
is defined for StateT
as:
m <|> n = StateT $ \ s -> runStateT m s <|> runStateT n s
?
@neongreen, I see. I will try to define mtl-style MonadParsec
(although in Megaparsec we have more low-level functions than in Parsec (parsers
is modeled after Parsec
I guess) because I decided that we could define some of them clearer on lower level).
I'll try to eliminate user state, so user will need to combine monads to get backtracking state. This should work. Changes will live in monad-transformer
branch until it's clear that everything is all right.
I will also make combinators more general, many of them can work with Applicative
and Alternative
instances, not necessarily with MonadParsec
instances.
I had to make other minor changes to make it work smoothly. Also, it seems we could put functions to get current state (parser state, position in input stream, etc) into the class too. This way most part of the library will be defined for any instance of MonadParsec
. Tests will need to be corrected too. This requires more work then it seemed.
OK, this is almost done. I want to re-work some tests for Text.Megaparsec.Prim
though. I think it's good idea to add tests that check backtracking abilities of StateT MyStateType Parser a
monad combination. Once it's done I'll organize and push my changes.
@neongreen, I'll ask you to try the improvements and see if Megaparsec is flexible enough now.
@neongreen, please try out code in monad-transformer
branch. I also ask you to write one or two tests that you deem realistic and practical to test backtracking state achieved by the method you mention. I suggest you first test this locally and tell me about results, then I merge this into master. Then you can open a pull request adding the tests.
Okay, here go some observations.
<?>
should be moved out of the class, I think. Alternatively, there should be a MINIMAL
pragma saying that only one of {label
, <?>
} is needed. Justification: I got bit by this when I was defining an instance of MonadParsec
, forgot about both label
and <?>
, GHC didn't warn me (because they both have default definitions), and as the result I spent 10m trying to find out why my parser was going in an endless loop.We need instances for MonadParsec s (WriterT w m) c
and so on. Here's a sample instance for WriterT
:
instance (Monoid w, MonadParsec s m c) => MonadParsec s (WriterT w m) c where
unexpected x = lift $ unexpected x
eof = lift $ eof
token x y = lift $ token x y
tokens x y z = lift $ tokens x y z
getParserState = lift $ getParserState
updateParserState f = lift $ updateParserState f
try (WriterT m) = WriterT $ try m
label x (WriterT m) = WriterT $ label x m
lookAhead (WriterT m) = WriterT $ lookAhead m
notFollowedBy (WriterT m) = WriterT $
notFollowedBy (fst <$> m) >>= \x -> return (x, mempty)
The same instance in parsers
has {-# INLINE ... #-}
for all parsers, so maybe we should inline all the things too. I'll try to benchmark it.
It's more annoying now to write generic parsers. Before:
whitespace :: Parser ()
whitespace = void (some (char ' '))
After:
whitespace :: MonadParsec s m Char => m ()
whitespace = void (some (char ' '))
Small as the difference may be, it might push people towards reusing less code than they used to, and it's bad. I'd like both Text.Megaparsec.String
and Text.Megaparsec.Text
to export a Parsing
that would be respectively MonadParsec String m Char
and MonadParsec Text m Char
under the hood (after all, they already export different Parser
s that are named the same). This would make the code look like this:
whitespace :: Parsing m => m ()
whitespace = ...
inParens :: Parsing m => m a -> m a
inParens = ...
Oh, and another observation: I had to write
warnParse :: WarnParser a -> SourceName -> Text -> Either ParseError (a, [String])
warnParse p src s = parse (runWriterT p) src s
and it feels like it's something that should've been done automatically, but I have no idea how to actually accomplish this.
I agree with points 1 and 2. As for 3, I think end user can define Parsing
type or something like that if he thinks it's necessary. Mostly I don't see any problem in MonadParsec s m Char => m ()
stuff.
I currently don't think it should be done automatically. You're trying to run code inside monad and usually if you have stack of monads you need to nest functions like parse
and runWriterT
accordingly. Some functions like runStateT
will need to accept additional arguments like initial state, so I don't think all these cases can be handled in uniform (non-hackish) way.
Some functions like
runStateT
will need to accept additional arguments like initial state, so I don't think all these cases can be handled in uniform (non-hackish) way.
Okay, you're right.
As for 3, I think end user can define
Parsing
type or something like that if he thinks it's necessary.
Sure, but at least in my case I'm probably going to be defining Parsing
literally every time I use Megaparsec with a custom stack, and unlike parse
it can be done generically in a non-hackish way. I guess GenParser
was included into original parsec
for the same reason. (By the way, am I right that GenParser
shouldn't be in the monad-transformer branch at all?)
@neongreen, OK, I'll add Parsing
and GenParser
is indeed should disappear, just overlooked that.
@neongreen, To write something like this:
type Parsing m = MonadParsec C.ByteString m Char
You need to enable ConstraintKinds
extension. Because it's synonym for a constraint, not type. This once more makes me think that this is not entirely necessary.
I think we will do it this way: for now Parsing
won't be added. Then in future releases this might be reconsidered. After all it saves not much typing.
Okay.
@neongreen, There will be two instances for WriterT
: Lazy.WriterT
and Strict.WriterT
, by the way.
@neongreen, about inlining: honestly, I think GHC will inline all this stuff anyway. So, you probably won't be able to find any performance difference.
@neongreen, I've committed all the changes, so you can try again. Is there anything else you want to tell me?
I've committed all the changes, so you can try again.
Okay, thanks. I'll start writing tests now.
@mrkkrp, I have just realised that Parsec's lookAhead
and Megaparsec's lookAhead
have different semantics – Parsec's lookAhead
reverts changes made by the parser to backtracking state, Megaparsec's lookAhead
doesn't. It won't matter in most cases when lookAhead
is used; however, it might be useful to have a version of lookAhead
that adheres to the intuitive definition of “run the parser but don't let it change anything whatsoever”. Would it be easy to write? (I don't know Parsec's internals well enough to answer this question on my own. It may even be impossible, I don't know.)
@neongreen, parsers
seems to be brainy about the stuff. How does lookAhead
work there?
@mrkkrp, I already checked – it isn't being brainy in this particular case, unfortunately.
import Text.Parser.LookAhead
import Text.Parser.Combinators
import Text.Parsec.String
import Text.Parsec (parse, runParser, Parsec, setState, getState)
import qualified Control.Monad.State as State
tParsers = parse (State.evalStateT p 0) "" ""
where p :: State.StateT Integer Parser Integer
p = do
State.put 0
_ <- lookAhead (State.put 1 >> eof)
State.get
tParsec = runParser p 0 "" ""
where p = do
setState 0
_ <- lookAhead (setState 1 >> eof)
getState
> tParsers
Right 1
> tParsec
Right 0
@neongreen, ah I see they just use Parsec's version without reimplementing it.
instance (Parsec.Stream s m t, Show t) => LookAheadParsing (Parsec.ParsecT s u m) where
lookAhead = Parsec.lookAhead
@neongreen, I think Megaparec should revert backtracking state, you're right. I'll see how to fix it. Make sure you include something in your tests to check this feature in future.
@neongreen, Have you tried with simple alternatives à la p <|> n
? Does it work as expected?
@mrkkrp, yep, it works alright. E.g. the modified prop_user_backtrack
test (that already was there) passes:
prop_user_backtrack :: Integer -> Integer -> Property
prop_user_backtrack n m = runParser (State.evalStateT p 0) "" "" === Right n
where p = do
State.put n
(State.put m >> fail "failed") <|> return ()
State.get
OK, first of all, Megaparsec's and Parsec's lookAhead
are essentially identical:
-- Megaparsec
pLookAhead :: ParsecT s m a -> ParsecT s m a
pLookAhead p = ParsecT $ \s _ cerr eok eerr ->
let eok' a _ _ = eok a s mempty
in unParser p s eok' cerr eok' eerr
-- Parsec
lookAhead :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m a
lookAhead p =
ParsecT $ \s _ cerr eok eerr -> do
let eok' a _ _ = eok a s (newErrorUnknown (statePos s))
unParser p s eok' cerr eok' eerr
Another thing is Megaparsec currently has no built-in user state. But this should not matter. As your little experiment demonstrates, while Parsec's lookAhead
generally works OK, that version with StateT
transformer doesn't work so great. So I think the problem is in definition of lookAhead
for StateT
.
@neongreen, This means that all these definitions of WriterT
, StateT
, etc. in parsers are not necessarily correct with respect this sort of corner case. We should perhaps correct them in Megaparsec instead of copying them.
Maybe we should also open an issue on GitHub page of parsers
library.
@neongreen, We also should tests all these definitions too, I think. Don't worry about that, I will fix it and later add new tests myself.
@neongreen, Try with this definition (with TupleSections
extension):
instance (MonadPlus m, MonadParsec s m t) =>
MonadParsec s (L.StateT e m) t where
label n (L.StateT m) = L.StateT $ \s -> label n (m s)
try (L.StateT m) = L.StateT $ try . m
lookAhead (L.StateT m) = L.StateT $ \s ->
(,s) . fst <$> lookAhead (m s)
notFollowedBy (L.StateT m) = L.StateT $ \s ->
notFollowedBy (fst <$> m s) >> return ((),s)
unexpected = lift . unexpected
eof = lift eof
token f e = lift $ token f e
tokens f e ts = lift $ tokens f e ts
getParserState = lift getParserState
updateParserState f = lift $ updateParserState f
If this passes the test I will commit necessary corrections.
I don't have anything to add, but it looks like you two are doing pretty good work - thanks!
@neongreen, I checked it and it seems to work OK. I've committed the changes so this bug should be eliminated by now.
@mrkkrp, I have the following (passing) tests that use lookAhead
, notFollowedBy
, and <|>
:
-- Monad transformers + Parser
prop_StateT :: Integer -> Integer -> Property
prop_StateT n m = checkParser (State.evalStateT p 0) (Right (n+m)) ""
where p = do
State.put n
State.modify (+ m)
State.get
prop_StateT_backtrack :: Integer -> Integer -> Property
prop_StateT_backtrack n m = checkParser (State.evalStateT p 0) (Right n) ""
where p = do
State.put n
(State.put m >> fail "failed") <|> return ()
State.get
-- See <https://github.com/mrkkrp/megaparsec/issues/27#issuecomment-141785141>.
prop_StateT_lookAhead :: Integer -> Integer -> Property
prop_StateT_lookAhead n m = checkParser (State.evalStateT p 0) (Right n) ""
where p = do
State.put n
lookAhead (State.put m >> eof)
State.get
prop_WriterT :: String -> String -> Property
prop_WriterT pre post = checkParser (Writer.runWriterT p) result "abx:"
where logged_letter = do
x <- letterChar
Writer.tell [x]
return x
logged_colon = do
x <- char ':'
Writer.tell [x]
return x
p = do
Writer.tell pre
cs <- Writer.censor (map toUpper) $
many (try (logged_letter <* notFollowedBy logged_colon))
Writer.tell post
_ <- logged_letter
_ <- logged_colon
return cs
result = Right ("ab", pre ++ "AB" ++ post ++ "x:")
Not claiming the last one is particularly realistic :)
I have a question: how should label
be tested, if at all? I.e. can Megaparsec's new error reporting system somehow interfere with monad transformers?
@neongreen, There are two cases when label has effect:
p
succeeds without consuming input and next parser n
fails immediately after p
without consuming input. “Label” of p
will be displayed in the error message.See existing prop_label
for example. Although it's a bit dense in that it tests many behaviors at once, so you should not necessarily take it as base for new tests.
Also, prop_StateT
has nothing to do with MonadParsec
, it tests features of StateT
, which we can safely assume work OK. Two other tests for StateT
are OK. Please add one more to test notFollowedBy
, I think it's easy. Other functions all should be tested too, preferably…
I advise you enable coverage statistics and see what else you can test. It's a good idea to test everything, although it's more than I can ask from you. Try at least test all functions in instance declaration of StateT
. Also, you can have two copies of the same tests, one for lazy version of state monad and another one for strict version.
Coverage dropped because of these new changes and I think we should restore it at least on 80 % level. Initially I naively thought that these “boilerplate” definitions are not necessary to test, but after this issue with lookAhead
I think everything should be tested.
I suppose this may be closed now.
(See the previous discussion here.)
ParsecT
lets us combine parsing with things such as keeping state, accessing configuration, etc. However, any monad you embed intoParsecT
won't be affected by Parsec's backtracking properties:In this example having non-backtracking state was desirable. Sometimes having backtracking state is desirable – for instance, my usecase is parsing a list of rules, where each rule is merely a named list of patterns, and patterns can reference earlier-defined patterns (which is why all parsed patterns are added to state); if a single pattern in a rule can't be parsed I want to throw the rule away completely and forget that I ever parsed it, which means forgetting about all successfully-parsed patterns in the rule. (Sorry if the example is unclear or seems contrived.)
One way to have backtracking state is to use Parsec's built-in “user state”. Unfortunately, this solution isn't extensible, as Parsec provides no built-in writer, reader, or any other useful monad you could think of.
Another way is to use
which, if you aren't familiar with
StateT
, is another way of sayingwhich hopefully makes it clearer why this variant would preserve backtracking; if it doesn't, look at the definition of
<|>
forStateT
:Here both actions are applied to the same state, so it's impossible that both parsers' modifications would end up in the final state.
At the moment this solution isn't viable either, because all Parsec's parsers work in
ParsecT
and can't be used inStateT
. So, we have to uselift
:However, we can't just define
char = lift . primitiveCharParser
because that would makechar
unusable in stacks of 2 and more transformers. So, instead we might have to create a new mtl-style class (perhapsMonadParsec
?) instead. A similar class already exists in parsers (seeParsing
,CharParsing
); however, we likely won't be able to reuse parsers because parsers depends on parsec/attoparsec утв because parsers has different naming conventions etc: