Open voku opened 5 years ago
Not really following the idea... The first one will work when u-modifier is added (-> I can report if \W, \w used without u-modifier). The second one ignores the e.g. umlauts and co.
What was the root cause?
My problem seems to be the missing u-modifier. 😊 So a hint for this is maybe a good idea?
Question: In the Yii string class (https://github.com/yiisoft/string/blob/master/src/StringHelper.php) I saw that they use the u-modifier also for cases where they only use "\s"? Do we really need to add the u-modifier also in this cases?
Ideally, any of \s, \S, \w, \W, \d, \D usage needs to be backed with u-modifier. Yes, if app should work properly with Unicode (because of Arabic numbers and additional space-characters in Unicode).
Today I replaced some "[A-Z]" regex stuff with "\p{Lu}" (https://github.com/voku/portable-utf8/commit/98cca6387503f9c8b3bb54ed97350e9fac140941), so that I can process unicode chars, maybe hints like that are also helpfully?
Gladly, in which cases (we have multiple options now) what? =)
So the checks are:
u
modifier and \w
is used suggest using \pL
because former matches ASCII only.u
modifier and \d
is used suggest using \pN
because former matches ASCII only.u
modifier and \s
is used suggest using \pZ
because former matches ASCII only.u
.u
modifier and there's a HEX value greater than \x{FF}
, suggest adding u
. It would error without it.\x{FF}
or any \p*
or any unicode characters, suggest removing u
. This one is harmless so if not implemented, there's no big loss.@samdark \s
contains also e.g. \t
but \p{Z}
did not. https://regex101.com/r/fPsz0y/2
Actually, 6. isn't needed because u
can simply mean input is expected to be unicode. Thus it would be annoying.
Description:
It would be nice, if you can warn us that we should use unicode regex instead of ascii regex.
Example
ascii: https://regex101.com/r/lM5Zy8/1
unicode: https://regex101.com/r/SJ4oDG/1
Unicode Regex via PHP: https://youtu.be/VRiF9xd0YQc?t=3264