php / php-src

The PHP Interpreter
https://www.php.net
Other
37.82k stars 7.72k forks source link

preg_match and preg_match_all both do not set $matches on regex fail, leaving as it was set prior to call #7917

Open mschroebel opened 2 years ago

mschroebel commented 2 years ago

Description

The following code:

<?php // preg_match_all.php
$matches = [0 => 'Orange',
            1 => 'Apple',
            2 => 'Banana',
            3 => 'Monkey'];
$result = preg_match_all('aeiou{0,', 'ou', $matches);
var_dump($result);
var_dump($matches);

$result = preg_match('aeiou{0,', 'ou', $matches);
var_dump($result);
var_dump($matches);

Resulted in this output from command line %php preg_match_all.php:

PHP Warning:  preg_match_all(): Delimiter must not be alphanumeric or backslash in preg_match_all.php on line 6
PHP Stack trace:
PHP   1. {main}() preg_match_all.php:0
PHP   2. preg_match_all($pattern = 'aeiou{0,', $subject = 'ou', $subpatterns = [0 => 'Orange', 1 => 'Apple', 2 => 'Banana', 3 => 'Monkey']) preg_match_all.php:6
preg_match_all.php:7:
bool(false)
preg_match_all.php:8:
array(4) {
    [0] =>
  string(6) "Orange"
    [1] =>
  string(5) "Apple"
    [2] =>
  string(6) "Banana"
    [3] =>
  string(6) "Monkey"
}
PHP Warning:  preg_match(): Delimiter must not be alphanumeric or backslash in preg_match_all.php on line 10
PHP Stack trace:
PHP   1. {main}() preg_match_all.php:0
PHP   2. preg_match($pattern = 'aeiou{0,', $subject = 'ou', $subpatterns = [0 => 'Orange', 1 => 'Apple', 2 => 'Banana', 3 => 'Monkey']) preg_match_all.php:10
preg_match_all.php:11:
bool(false)
preg_match_all.php:12:
array(4) {
    [0] =>
  string(6) "Orange"
    [1] =>
  string(5) "Apple"
    [2] =>
  string(6) "Banana"
    [3] =>
  string(6) "Monkey"
}

But I expected this output instead:

PHP Warning:  preg_match_all(): Delimiter must not be alphanumeric or backslash in preg_match_all.php on line 6
PHP Stack trace:
PHP   1. {main}() preg_match_all.php:0
PHP   2. preg_match_all($pattern = 'aeiou{0,', $subject = 'ou', $subpatterns = [0 => 'Orange', 1 => 'Apple', 2 => 'Banana', 3 => 'Monkey']) preg_match_all.php:6
preg_match_all.php:7:
bool(false)
preg_match_all.php:8:
either: null or array[0]
PHP Warning:  preg_match(): Delimiter must not be alphanumeric or backslash in preg_match_all.php on line 10
PHP Stack trace:
PHP   1. {main}() preg_match_all.php:0
PHP   2. preg_match($pattern = 'aeiou{0,', $subject = 'ou', $subpatterns = [0 => 'Orange', 1 => 'Apple', 2 => 'Banana', 3 => 'Monkey']) preg_match_all.php:10
preg_match_all.php:11:
bool(false)
preg_match_all.php:12:
either: null or array[0]

PHP Version

PHP 7.4.27

Operating System

No response

mschroebel commented 2 years ago

OS is ubuntu 20.04

cmb69 commented 2 years ago

Yeah, but what's wrong with that behavior?

mschroebel commented 2 years ago

Seems to me from programming for 35 years that when you call a function that is supposed to return matches and it fails, that there ought to be nothing there. I didn't discover it because I had a logic problem, but it seems that there is a pitfall there. I was just stepping through my code with Xdebug and phpStorm, seeing the fail and the value returned. Say you call preg_match_all to validate something, and then again to further verify without checking the result. The $matches still has the values from the first call that worked. My thinking when saw that was there are lots and lots of these cases hanging around out there, where everything looks "hunky dory" yet isn't because the 2nd call failed yet it's not obvious in the browser. Regex's are not the easiest things to read and work with so failing regex's are probably common. If you have 2 in the same script both using the same var for $matches, you may never know. It's not how I would expect things to be handled and it's really the only odd behavior I've found in php in 24 years. I'm only a programmer that uses php, it's up to you guys that define it. Close if you want. I know of the pitfall myself, now. Seems like shrapnel to me, though, Garbage hanging about. Been there forever I'd guess.

cmb69 commented 2 years ago

Well, the basic logic is:

function preg_match($pattern, $subject, &$matches) {
    switch (match($pattern, $subject)):
        case MATCHED:
            $matches = whatever_matched();
            return 1;
        case DID_NOT_MATCH:
            $matches = [];
            return 0;
        case FAILURE:
            return false;
    }
}

If $matches was an out parameter, I would expect it to be undefined, if preg_match() failed. Since it is a by-ref parameter, I would expect it to be unmodified in this case. Maybe "expect" is a bit of an overstatement here; I'd just not rely on any particular value.

Perhaps this is just something we should document?

iluuu1994 commented 2 years ago

The best behavior for PHP would probably be to throw an exception so your program never accesses $matches when providing an invalid regex in the first place. That's something worth discussing for PHP 9.

cmb69 commented 2 years ago

Changed to feature request per @iluuu1994's comment above.