Open PgBiel opened 10 months ago
Yes. Problem is that we can't just use the regex engine typst uses. We are limited to the Haskell ecosystem. So what I do is use the regex-tdfa package for the basics, and try to supplement it when possible for things it is missing. E.g. it is missing \d \w \s
, ?
, and +
, so I just replace these with equivalents. Of course, this isn't 100% reliable, and we can already see a place where it produces bad results in your #1 -- (?m)
is a special construction; ?
here doesn't mean "0 or 1", but my hack just replaces the ?
with {0,1}
with terrible results.
I could switch to using another regex engine. Hackage has regex-pcre-builtin, which comes with the C sources so that an external dependency isn't introduced. I've tried to avoid using wrapped C libraries in pandoc, but maybe could reconcsider in this case. I imagine pcre would be pretty close.
I also reimplemented as much as I needed of the regex engine used by KDE for my skylighting library. This isn't currently published as a separate package, though.
Oh, I see there is now https://hackage.haskell.org/package/regex-rure But this would make pandoc depend on an installation of librure.so/dylib somewhere; I want to avoid that and have a perfectly self-contained static binary.
Hello, I've observed several inconsistencies between the regex pandoc uses when reading Typst documents and the regex Typst uses.
Here are a few of them:
i
,m
,s
,u
,x
. Of those, onlyi
appears to be supported by Pandoc. For example,#(regex("(?m)a") in "A")
compiles in Typst, but doesn't in Pandoc (3.1.11.1 via try.pandoc.org), with the error(line 1, column 2): parseRegex for Text.Regex.TDFA.Text failed:"({0,1}m)a" (line 1, column 4): unexpected '0' expecting an atom
.m
(multiline) flag in order to be able to match the start of a line with^
and the end of a line with$
.#(regex("(?:x)") in "x")
compiles in Typst, but not in pandoc ((line 1, column 2): parseRegex for Text.Regex.TDFA.Text failed:"({0,1}:x)" (line 1, column 4): unexpected '0' expecting an atom
).#(regex("(?P<a>x)") in "x")
compiles in Typst, but not in Pandoc ((line 1, column 2): parseRegex for Text.Regex.TDFA.Text failed:"({0,1}P<a>x)" (line 1, column 4): unexpected '0' expecting an atom
).Besides non-compilation, there are inconsistencies in the results of regex matching as well.
#(regex("[\s\S]+") in "x")
returnstrue
in Typst, butfalse
in Pandoc.#("a \n b" == "a \n b".match(regex("[^.]+")).text)
returnstrue
in Typst, butfalse
in Pandoc. In general,[ ]
seems to unable to accept newlines, when it should.There are probably inconsistencies I haven't found yet as well, but they could be added to this issue as they are found.