sunra / php-simple-html-dom-parser

PHP Simple HTML DOM Parser adaptation for Composer and PSR-0
1.3k stars 352 forks source link

Escape minus char in regular expressions #63

Open mlocati opened 6 years ago

mlocati commented 6 years ago

In PHP 7.3, PCRE2 (the library that parses regular expressions) has been upgraded to version 10.32.

It seems that this new version of PCRE is a bit more picky when parsing regular expressions. For example, /^[\w-:]+$ is not accepted anymore, we have this error (see https://3v4l.org/vD5O2):

Compilation failed: invalid range in character class at offset 4

The reason? - is used to represent range of characters when in square brackets (for example: [A-Z] represents a character from A to Z). And PCRE now interprets [\w-:] as any character from \w to :, which doesn't make sense (so the error is thrown).

The solution? Just escape - with \ to tell PCRE that we actually want the character -, and not a range of characters.

zanderwar commented 4 years ago

Still an issue over a year later