Open Hi-Angel opened 4 days ago
You would need the multiline
flag to be enabled for it to match "My"
- but it should return ""
, because it'll be matching on $
but nothing else (.
doesn't match newlines without dotAll
- if you remove the trailing \n
it should return "string"
).
I'm not really sure where the culprit is here, neither the parsing nor Data.String
regex stuff looks wrong from an initial glance.
Oh actually, immediately after saying that I see it. The parser regex inserts a ^
at the start of the pattern so multiline
or dotAll
would be required for anything to match. I think it's correct that it only matches from the current parser position, but maybe the consequence of that needs emphasising in the comment for it (it does already mention it matches from the current position).
That is with noFlags flag that implies multiline: false, so $ should match something.
It seems like there's maybe a slight misunderstanding wrt JS RegExp semantics. Without the multiline flag $
always matches end of the string, not end of the line. Maybe you are expecting the inverse, where the multiline flag is where $
matches end of string, but it is in fact matching end of line? It's definitely confusing.
That is with noFlags flag that implies multiline: false, so $ should match something.
It seems like there's maybe a slight misunderstanding wrt JS RegExp semantics. Without the multiline flag
$
always matches end of the string, not end of the line. Maybe you are expecting the inverse, where the multiline flag is where$
matches end of string, but it is in fact matching end of line? It's definitely confusing.
Oh, just as I was typing out, you clarified, thank you! Yes, I am not a web-programmer, and in all other languages "multline regex being true" means the opposite, i.e. when you consider whole string more or less as a single line. Okay, I see now, let me try that.
Oh actually, immediately after saying that I see it. The parser regex inserts a
^
at the start of the pattern somultiline
ordotAll
would be required for anything to match. I think it's correct that it only matches from the current parser position, but maybe the consequence of that needs emphasising in the comment for it (it does already mention it matches from the current position).
Sorry, I'm confused here 😅 So, the regex
matches from current position, and for that it opaquely inserts a ^
to the beginning of the expression. This sounds like a bug on its own, because if a user have line1\n|line2
where the |
marks the caret position, and then a user asks for ^line2
, inserting the ^
will result in ^^line2
.
It gets wrapped with ^(...)
. So If you add an extra ^, it's unnecessary, but benign. Matching from the current cursor position is the only thing that makes sense for a parser that needs to consume it's input. I'm not sure what it would mean for a regex
combinator to match arbitrary locations. The JS API doesn't tell you where matches happened, so we need an anchor that assumes we can just jump based on the length of the match.
Thanks for the docs change proposal.
Also note that if you want “The regex from Parsing
but with more convenient interface” then you can just use the Data.String.Regex module directly instead of using Parsing
.
Also, I know this is just your test code, but note that in real code you shouldn't compile the regex
during the parse for the reasons discussed in Parsing.String.regex.
-- The regex from `Parsing` but with more convenient interface.
regexP :: String -> Parser String String
regexP re = case regex re noFlags of
Left compileError -> fail $ "failed to compile regex: " <> compileError
Right ret -> ret
Thank you!
Also note that if you want “The regex from
Parsing
but with more convenient interface” then you can just use the Data.String.Regex module directly instead of usingParsing
.
In the Parsing
library context, the convenient interface is one that (in particular) returns a Parsing String String
. The Data.String.Regex.regex
returns Either
, so still requires wrapping, and if I have to wrap, there's not much difference which one it's gonna be.
Also, I know this is just your test code, but note that in real code you shouldn't compile the
regex
during the parse for the reasons discussed in Parsing.String.regex.
Fair warning, because it's a snippet from an actual project. However, I've read the documentation paragraph referred (the discussion about constant vs dynamic generation in particular) but didn't grasp why should't I be compiling it in runtime.
Now that I'm experimenting with spy
I see that it might be due to the fact that regexp doesn't get generated just once but on every call, perhaps that's what you were referring to? In that case though that's a problem, but not the one related to the code or the library, but one in the PureScript compiler. E.g. when I have this code:
-- a word at the beginning of line with optional whitespace
topLevelWord :: String -> Parser String String
topLevelWord s = regexP $ "^\\s*" <> s <> "\\b"
…
topLevelWord "SomeFoo"
compiler should detect that it's a pure function-chain, so the calls can be replaced with a constant.
But it's part of the optimization process, and from what I understand the PureScript designers for some reason relegated optimizations to a separate project. So that means that if (or rather I'm hoping "when") my project comes to production, I'll have to use this separate project to produce the production code.
I see that it might be due to the fact that regexp doesn't get generated just once but on every call, perhaps that's what you were referring to?
That is what he meant, yeah.
Describe the bug
Using a pretty simple
".*$"
regexp results inNo Regex pattern match
. That is withnoFlags
flag that impliesmultiline: false
, so$
should match something.To Reproduce
Run this code:
Expected behavior
The parser ought to return string
My
Actual behavior