Closed xxyzz closed 3 months ago
I'll take a look at it. Regex... :sob:
The interesting part is when I put this regex pattern on https://regex101.com, it would match [[a|b\nc]]
but not in Python code. Not sure what's going on...
Can you paste what you tested on regex101? there's a couple of ?
that need to removed after the MAGICAL characters.
I tested some variations on the link syntax in the Wiktionary sandbox:
[[test|testing this 1 ok]]
[[test|testing
this 2 ok]]
[[test|
testing
this 3 ok
]]
[[test
|testing this4 fails
]]
[[
test|
testing this5 fails
]]
You can't have newlines in the [[...name....|
part of the url, but otherwise you can seemingly have as many newlines in the text portion.
I just took a deeper look at the regex and remembered that I wrote this horrible, horrible thing... Oh no.
I removed the nowiki magic number from the pattern, it's doesn't affect the result. Here is the pattern I tested on regex101: \[\[(((?!\]\])[^[\n])*(?!\[[\n]+\])((?!\[\[)[^]\n])+)\]\]
, it's basically the same pattern in our code. It also works with the PHP flavor, but doesn't match when using the Python re library.
I get this result (same with the Python option):
I think I have an error (other) in the regex:
+ r"((?!\]\])[^[\n])*(?!\[[\n]+\])((?!\[\[)[^]\n])+"
# ( no ]] ) ( no [ ) ( no [...] )( no [[) (no ])
should probably have been
+ r"((?!\]\])[^[\n])*(?!\[[^\n]+\])((?!\[\[)[^]\n])+"
# ( no ]] ) ( no [ ) ( no [...] )( no [[) (no ])
But this is unrelated to the current problem...
I use the "[[a|b\nc]]" test text on regex101, I guess the test sting on regex101 doesn't make "\n" a new line character...
Sorry for the distraction... I though maybe the pattern works but somehow only doesn't work in Python's re library.
I think I've got something...
(?<!\[) # negative lookbehind, [[[ breaks the link completely, the whole thing is not parsed as a link or url
\[\[ # start brackets
(
(
(?!\]\]) # negative lookahead, no ]] allowed
[^[\n]
)* # no [ or newlines allowed
(
(?!\[\[) # no [[ allowed
[^]\n] # no ] or newlines
)+
)
(\| # after a |, newlines are allowed, the below is the same as above
(((?!\]\])[^[])*((?!\[\[)[^]])+)
)?
\]\]
Page: https://en.wiktionary.org/wiki/forswat Simplified Wikitext:
[[a|b\nc]]
Error message: https://kaikki.org/dictionary/All%20languages%20combined/errors/details--2--is-an-alias-of--year---cannot-spec-Q6~yILHj.htmlThe links regex at here https://github.com/tatuylonen/wikitextprocessor/blob/cdd76b208685d2e040e03a95a9ecde8e89390c68/src/wikitextprocessor/core.py#L160
can't match the
[[a|b\nc]]
link, @kristian-clausal could you please take a look of the regex? I'm not dare to change this pattern...