paquettg / php-html-parser

An HTML DOM parser. It allows you to manipulate HTML. Find tags on an HTML page with selectors just like jQuery.
MIT License
2.37k stars 461 forks source link

Invalid internal use of preg_match_alll() #313

Open chaslain opened 1 year ago

chaslain commented 1 year ago
PHP Warning 'yii\base\ErrorException' with message 'preg_match_all(): Compilation failed: invalid range in character class at offset 4'

in vendor/paquettg/php-html-parser/src/PHPHtmlParser/Selector.php:91

Code was this:

$file = file_get_contents($this->file_path);
$dom = new Dom;
$dom->loadStr($file, []);

$rows = $dom->find("tr");

php version: PHP 7.3.33 (cli) (built: Mar 18 2022 03:41:41) ( NTS ) Package version: 1.7.0

chaslain commented 1 year ago

Same error if using provided method of loading from file instead.

FMaz008 commented 10 months ago

Same error trying to use it as follow:


require "../vendor/autoload.php";
use PHPHtmlParser\Dom;
$url = "https://google.com";
$dom = new Dom;
$dom->loadFromUrl($url);

Interestingly, I place a var_dump just before that line so get you some more details: var_dump($this->pattern, $selector);

Result:

string(103) "/([\w-:\*>]*)(?:\#([\w-]+)|\.([\w-]+))?(?:\[@?(!?[\w-:]+)(?:([!*^$]?=)["']?(.*?)["']?)?\])?([\/, ]+)/is"
string(29) "meta[http-equiv=Content-Type]"
<br />
<b>Warning</b>:  preg_match_all(): Compilation failed: invalid range in character class at offset 4 in <b>/home/fmaz878/vendor/paquettg/php-html-parser/src/PHPHtmlParser/Selector.php</b> on line <b>92</b><br />

Note that without the var_dump the error is on line 91.

Specifically, what is wrong is using "-" after \w, which tries to create a range, but fail to follow proper syntax. I won't pretend to understand the purpose of that regexp, but escaping the dash seems to resolve that specific issue (and create a different error. /([\w\-:\*>]*)(?:\#([\w\-]+)|\.([\w\-]+))?(?:\[@?(!?[\w\-:]+)(?:([!*^$]?=)["']?(.*?)["']?)?\])?([\/, ]+)/is