Open claudiobrandt opened 6 years ago
This seems to be related to the high ascii ×
character you're using (code=158
). Replacing it with z
in both the Expression and the Text fixes the issue. A quick test shows that it happens with other high ascii characters like ÿ
, ç
, and ©
.
We'll have to do some testing and see if its something we're doing, or something inherent in PCRE.
Here's a more concise reproduction of the issue: https://regexr.com/3rggr
This comes back to how php's pcre engine is implemented. It doesn't use the character index when returning the index of a match, instead the byte offset is used. That means when running on UTF-8 encoded text that index can be off by 1 or more based on the character being used.
Example using a multibyte Chinese character: https://regexr.com/3rlqh
For reference: https://bugs.php.net/bug.php?id=37391
So we'll have to look into manually setting the match offset.
This also affects highlighting: https://regexr.com/40u3b
https://regexr.com/48d2t There is a duplication of parts of the text in the Details tab that seems to be due to the nesting of non-capturing groups in PCRE mode (Edit: Or just nesting. See https://regexr.com/48d9u (a|(c)) triplicates the c off 'ca' but not 'ac')
Hi, thanks for this tool! regexr.com/3rg2m While the Replace tab shows the regular expression is working fine, the Details tab has the highlighted groups in a way that one character ("-'), which should belong in the last group, as part of the to-be-replaced character ("×-" instead of only "×").