php / php-src

The PHP Interpreter
https://www.php.net
Other
38.17k stars 7.75k forks source link

`preg_grep` should return false when `preg_match` does #11936

Open mvorisek opened 1 year ago

mvorisek commented 1 year ago

Description

The following code:

https://3v4l.org/kihD6

<?php

var_dump(preg_match('/.*/', hex2bin('ff')));
var_dump(preg_match('/^$/', hex2bin('ff')));
var_dump(preg_match('/.*/u', hex2bin('ff')));
var_dump(preg_grep('/.*/u', [hex2bin('ff')]));

Resulted in this output:

int(1)
int(0)
bool(false)
array(0) {
}

But I expected this output instead:

int(1)
int(0)
bool(false)
bool(false)

PHP Version

any

Operating System

any

damianwadley commented 1 year ago

No? The purpose of preg_grep is to find all strings in an array that match a pattern, so the function should always be returning an array, and if none of the strings match the pattern then the array should be empty. Making it return different data types when successful makes userland code more awkward.

mvorisek commented 1 year ago

Matching pattern with u modifier againts non-UTF-8 input must emit an error. Until preg_* functions throw on error, false return type is expected.

KapitanOczywisty commented 1 year ago

After error preg_grep also stops searching in remaining elements: https://3v4l.org/o7uDf

<?php

var_dump(preg_grep('/.*/u', ['foo', 'bar']));
echo preg_last_error_msg(), "\n\n";

var_dump(preg_grep('/.*/u', ['foo', hex2bin('ff'), 'bar']));
echo preg_last_error_msg(), "\n\n";
array(2) {
  [0]=>
  string(3) "foo"
  [1]=>
  string(3) "bar"
}
No error

array(1) {
  [0]=>
  string(3) "foo"
}
Malformed UTF-8 characters, possibly incorrectly encoded
mvorisek commented 1 year ago

Yes, empty/incomplete array result is simply wrong and inconstent, as all other preg_* functions return false when preg_last_error() is non-zero. So IMHO it is a bug and should be fixed.

damianwadley commented 1 year ago

I missed that this was an example of a /u regex on a non-UTF-8 string. Let me say that up front.

(Actually, what I really think is best is that preg_grep returns an array of match/no-match/is-error Enums, but that ship has sailed.)

mvorisek commented 7 months ago

Can the Status: Needs Triage label be removed?