Closed nicolas-grekas closed 2 months ago
__isset
only confirms whether or not the "property" exists. That doesn't have to mean that the property exists and is not null.
So the call to __get
is still necessary.
edit: Ah, you mean how the property gets created between the two calls.
If you return false
from __isset
that's what happens. With true
, the concrete value must be fetched. Edit: It seems you're asking that the existence of the property must be repeated after isset
. I don't think isset
should be used as a mechanism to create properties...
Now seeing your edit @iluuu1994. But first of all yes the property should be fetched, but since isset does set the property, the fetch shouldn't get through get
I don't think isset should be used as a mechanism to create properties...
Nothing prevents it and that's actually the only way to properly implement lazy object so yes, it can :)
I don't think there's a good reason to add another check. Can't you just return the property you create in __isset
in __get
? Adding a check slows down the 99+% case where __isset
doesn't create a property.
??
operation with magic methods is rare so that perf shouldn't be an issue. I also expect the call to be fast since it's just a missing hashmap check. And yes, this behavior prevents writing correctly behaving lazy objects - we cannot make them truly transparent.
I must agree with @iluuu1994 here, __isset()
shouldn't be used to assign properties, and adding a check is not something I am in favour of.
I'm sorry but this doesn't make sense to me. The fact that isset can be used to initialize properties is irrelevant to the issue. Actually, it's a critical capability that is leveraged since more than a decade to implement lazy objects. You might not be familiar with how those are implemented but the general idea is simple: all properties are unset when the object is created, and magic methods are added by inheritance to defer initialization to the moment when any property is accessed - that being via any magic methods, isset included. There are critical piece of infrastructure that rely on lazy objects - many based on Doctrine entities and/or when using lazy services (eg Symfony DI container but not only it).
The current behavior just breaks the semantics of magic methods. They are supposed to be called only when a property isn't accessible, and the call to __get is just missing the check to guard this behavior.
Can you please expand on why you think things behave as expected? What's the argument to not fix this?
Currently isset and get are called as a pair: if (!exists()) { if (__isset()) return __get() else return null }
.
Your proposal is to do if (!exists()) { if (!__isset()) return null; } if (!exists()) { return __get(); }
.
This behaviour comes from the initial bugfix for PHP 7, it was probably done that way to just make it work, and maybe not the most fully thought out behaviour, but it's how it works today.
I don't think you should argue it would break critical pieces of infrastructure, otherwise they'd be long broken, since PHP 7 that is. @nicolas-grekas
I do however think, that from a theoretical perspective __get() should not be called, if it exists, period. It would be a break in long-standing behaviour though and as such only go into PHP 8.4, if this is changed.
I also don't think performance is a major concern in this case, as the relative overhead of isset() and get() calls must be far bigger than a simple undef check (for declared props) or a hash_exists (for dynamic props).
```diff diff --git a/Zend/tests/coalesce_is_create_property.phpt b/Zend/tests/coalesce_is_create_property.phpt new file mode 100644 index 0000000000..10927b6289 --- /dev/null +++ b/Zend/tests/coalesce_is_create_property.phpt @@ -0,0 +1,22 @@ +--TEST-- +Coalesce IS fetch should repeat property existence check after __isset +--FILE-- +$prop = 123; + return true; + } + public function __get($prop) { + throw new Exception('Unreachable'); + } +} + +$a = new A; +echo $a->foo ?? 234; +?> +--EXPECT-- +__isset +123 diff --git a/Zend/zend_object_handlers.c b/Zend/zend_object_handlers.c index 6a22156d18..d2b34ff9bf 100644 --- a/Zend/zend_object_handlers.c +++ b/Zend/zend_object_handlers.c @@ -589,6 +589,36 @@ ZEND_API uint32_t *zend_get_property_guard(zend_object *zobj, zend_string *membe } /* }}} */ +static zval *zend_std_read_dynamic_property(zend_object *zobj, zend_string *name, void **cache_slot, uintptr_t property_offset) +{ + if (EXPECTED(zobj->properties != NULL)) { + if (!IS_UNKNOWN_DYNAMIC_PROPERTY_OFFSET(property_offset)) { + uintptr_t idx = ZEND_DECODE_DYN_PROP_OFFSET(property_offset); + + if (EXPECTED(idx < zobj->properties->nNumUsed * sizeof(Bucket))) { + Bucket *p = (Bucket*)((char*)zobj->properties->arData + idx); + + if (EXPECTED(p->key == name) || + (EXPECTED(p->h == ZSTR_H(name)) && + EXPECTED(p->key != NULL) && + EXPECTED(zend_string_equal_content(p->key, name)))) { + return &p->val; + } + } + CACHE_PTR_EX(cache_slot + 1, (void*)ZEND_DYNAMIC_PROPERTY_OFFSET); + } + zval *retval = zend_hash_find(zobj->properties, name); + if (EXPECTED(retval)) { + if (cache_slot) { + uintptr_t idx = (char*)retval - (char*)zobj->properties->arData; + CACHE_PTR_EX(cache_slot + 1, (void*)ZEND_ENCODE_DYN_PROP_OFFSET(idx)); + } + return retval; + } + } + return NULL; +} + ZEND_API zval *zend_std_read_property(zend_object *zobj, zend_string *name, int type, void **cache_slot, zval *rv) /* {{{ */ { zval *retval; @@ -638,31 +668,9 @@ ZEND_API zval *zend_std_read_property(zend_object *zobj, zend_string *name, int goto uninit_error; } } else if (EXPECTED(IS_DYNAMIC_PROPERTY_OFFSET(property_offset))) { - if (EXPECTED(zobj->properties != NULL)) { - if (!IS_UNKNOWN_DYNAMIC_PROPERTY_OFFSET(property_offset)) { - uintptr_t idx = ZEND_DECODE_DYN_PROP_OFFSET(property_offset); - - if (EXPECTED(idx < zobj->properties->nNumUsed * sizeof(Bucket))) { - Bucket *p = (Bucket*)((char*)zobj->properties->arData + idx); - - if (EXPECTED(p->key == name) || - (EXPECTED(p->h == ZSTR_H(name)) && - EXPECTED(p->key != NULL) && - EXPECTED(zend_string_equal_content(p->key, name)))) { - retval = &p->val; - goto exit; - } - } - CACHE_PTR_EX(cache_slot + 1, (void*)ZEND_DYNAMIC_PROPERTY_OFFSET); - } - retval = zend_hash_find(zobj->properties, name); - if (EXPECTED(retval)) { - if (cache_slot) { - uintptr_t idx = (char*)retval - (char*)zobj->properties->arData; - CACHE_PTR_EX(cache_slot + 1, (void*)ZEND_ENCODE_DYN_PROP_OFFSET(idx)); - } - goto exit; - } + retval = zend_std_read_dynamic_property(zobj, name, cache_slot, property_offset); + if (retval) { + goto exit; } } else if (UNEXPECTED(EG(exception))) { retval = &EG(uninitialized_zval); @@ -693,6 +701,13 @@ ZEND_API zval *zend_std_read_property(zend_object *zobj, zend_string *name, int } zval_ptr_dtor(&tmp_result); + + retval = zend_std_read_dynamic_property(zobj, name, cache_slot, property_offset); + if (retval) { + OBJ_RELEASE(zobj); + goto exit; + } + if (zobj->ce->__get && !((*guard) & IN_GET)) { goto call_getter; } ```
I agree with @bwoebi in that this is more of a missing feature than a bug. The "property fetch" is currently considered an atomic operation. zend_std_read_property
does not actually consist of an "isset" and "get" phase. The properties value is fetched in the same step. Because there's no such equivalent in userland, we're forced to call both __isset
and __get
.
Anyway, returning false
from __isset
after creating a property is somewhat of a contradiction. We can act as if the property isn't there and call the ??
fallback (which normal properties cannot do, because they don't call __isset
), or use the property anyway (which feels like it's going against what the user said). I went for the former, which is also what @bwoebi has expressed in his pseudo-code.
I don't think you should argue it would break critical pieces of infrastructure
I made this point to answer the objection that "isset should[n't] be used as a mechanism to create properties..."
I agree that the reported behavior is an edge case. But it's one I stumbled upon while working on lazy objects so it's clearly a defect in how the engine behaves currently.
To me it's an edge case that won't affect anyone negatively in practice and I would consider it as a bug fix. But I don't mind patching this in 8.4 if you think that's how it should be done.
returning false from __isset after creating a property is somewhat of a contradiction
Unless the value of the property is null! (note that the reported case is about __isset returning true)
Thanks for the patch, I'm going to play with it right away.
It turns out the patch doesn't work for declared, unset properties, as @bwoebi has pointed out. I think the might be soundness issues in that case too, because __isset
may free the object, invalidating the pointer. As the fix might not be as trivial as expected, I'd definitely prefer waiting for 8.4.
This can't be fixed without a behaviour change. Personally, I think this shouldn't be changed. Otherwise we will have a segmentation with different behaviours before and after PHP-8.4 and "lazy properties" will have to support both ones.
In case the __isset()
callback is used in unexpected way, it should be also possible to implement __get()
supporting the existing engine behaviour.
@dstogov I think the argument is that lazy objects cannot transparently handle this case. @nicolas-grekas It would be nice if you could provide a minimal example of what case you cannot solve without this fix by overriding __get
.
Here is a simplified example :
<?php
// This is the class that we want to make lazy.
// Note that this is an opaque class from the PoV of the proxy-generation logic:
// its implementation details are unknown.
class A
{
protected array $data = [];
public function __construct()
{
$this->data = ['foo' => 123];
}
public function __get($n) { return $this->data[$n] ?? null; } // line 15
public function __set($n, $v) { $this->data[$n] = $v; }
public function __isset($n) { return isset($this->data[$n]); }
public function __unset($n) { unset($this->data[$n]); }
}
// This is a simplified proxy class that would typically be generated automatically
// from the signature of the proxied class. The way this works is by unsetting all
// declared properties at instantiation time, then using the magic methods to lazily
// initialize them on demand. This is a very simple implementation that does not handle
// all edge cases, but it should be enough to illustrate the issue. The logic in each
// magic method is very repetitive: we checks if we're trying to access the uninitialized
// property. If yes, we initialize it, if not, we call the parent method.
class LazyA extends A
{
private const STATUS_UNINITIALIZED = 0;
private const STATUS_INITIALIZING = 1;
private const STATUS_INITIALIZED = 2;
private $lazyStatus = self::STATUS_UNINITIALIZED;
public function __construct()
{
unset($this->data);
}
public function __get($n)
{
if (self::STATUS_INITIALIZED === $this->lazyStatus || 'data' !== $n) {
return parent::__get($n);
}
$this->initialize();
return $this->data;
}
public function __set($n, $v)
{
if (self::STATUS_INITIALIZED === $this->lazyStatus || 'data' !== $n) {
parent::__set($n, $v);
}
$this->initialize();
$this->data = $v;
}
public function __isset($n)
{
if (self::STATUS_INITIALIZED === $this->lazyStatus || 'data' !== $n) {
return parent::__isset($n);
}
$this->initialize();
return isset($this->data);
}
public function __unset($n)
{
if (self::STATUS_INITIALIZED === $this->lazyStatus || 'data' !== $n) {
parent::__unset($n);
}
$this->initialize();
unset($this->data);
}
private function initialize()
{
if (self::STATUS_INITIALIZING === $this->lazyStatus) {
return;
}
$this->lazyStatus = self::STATUS_INITIALIZING;
parent::__construct();
$this->lazyStatus = self::STATUS_INITIALIZED;
}
}
$a = new LazyA();
// Currently, this will trigger a TypeError:
// Cannot assign null to property A::$data of type array on line 15
// With Ilija's patch, this will work as expected and will display int(123)
var_dump($a->foo);
BTW, I confirm that with @iluuu1994's patch the test cases I was working on are all green: I'm able to fix a buggy behavior that I had to implement as a workaround for the current behavior.
The error message TypeError: Value of type null returned from A::__get() must be compatible with unset property A::$data of type array
looks weird, because it occurs when your program tries to read property "data" (that is unset) and you return null. The error message is correct.
This may be fixed by modifying the line 15
public function __get($n) { return $this->data[$n] ?? ($n == "data" ? [] : null);} // line 15
We cannot do that because that would break encapsulation at two levels:
data
anymoreThe issue is really the engine not behaving as it should - the rule is simple and not respected with the ??
operator. It should be fixed, that's all. What's the issue with fixing this behavior?
You can do it by checking whether the property is initialized in LazyA::__get
, as mentioned previously:
I do not object to modifying the behavior though, if @dstogov is ok with it. If he isn't then I suppose this requires an RFC. I will provide the implementation but I'd have to ask you to write the RFC text itself. Edit: I'll need to make sure the slowdown from my PR is avoidable though.
Here is a real life example where this issue causes serious trouble : by adopting typed properties and the nullsafe operator, we put Symfony users at risk: https://github.com/symfony/symfony/issues/52753
The workaround doesn't make sense: replacing $this->foo ?? null
by isset($this->foo) ? $this->foo : null
. Both structures are supposed to be the same but they're actually not, and it is impossible to anticipate when this can create trouble.
We cannot do that because that would break encapsulation at two levels:
1. class A couldn't store a virtual property named `data` anymore
It can't do it anyway, because in case A::data
is set, the magic methods are not going to be called.
2. the proxy class is meant to decorate any class without knowing anything about its implementation. Requiring such behavior from decorated classes would create some sort of coupling, which must be avoided.
I didn't understand what do you mean, but your example have special checks for name "data" in the child class.
The issue is really the engine not behaving as it should - the rule is simple and not respected with the
??
operator. It should be fixed, that's all. What's the issue with fixing this behavior?
??
operator means set and not NULL. So the engine calls isset() and then get(). It doesn't check if __isset()
does something completely unexpected (e.g. removes or adds properties). I don't think the engine should care about this. In case we change the behaviour now, the code that worked relaying on the existing behaviour may be broken.
As I told, this change may be done only in PHP-8.4 and this will only complicate things. If you really think this should be done, please submit the RFC.
I didn't understand what do you mean, but your example have special checks for name "data" in the child class.
You're right with the current code, that's what I meant when I wrote this is a simplified example. The real code uses the stack trace to add scope-related rules, so that data
is meant differently when accessed from private/protected scope vs public scope.
If you really think this should be done, please submit the RFC.
I don't understand what's the issue with fixing the current behavior. Nobody can rely on this behavior, it's just broken when compared to normal behavior of classes (no magic methods) I will submit the RFC if you think this should be the way to go.
The workaround doesn't make sense: replacing
$this->foo ?? null
byisset($this->foo) ? $this->foo : null
. Both structures are supposed to be the same but they're actually not, and it is impossible to anticipate when this can create trouble.
The difference that $this->foo ?? null
leads to calls to __isset()+__get()
and isset($this->foo) ? $this->foo : null
to __isset()+__get()+__get()
.
I don't understand what's the issue with fixing the current behavior. Nobody can rely on this behavior, it's just broken when compared to normal behavior of classes (no magic methods).
What kind of behavior without magic methods? The issue explicitly says - "When ?? is used to check a property using magic methods, PHP unconditionnaly calls both isset and get even when __isset did actually set the property".
The difference that $this->foo ?? null leads to calls to isset()+get() and isset($this->foo) ? $this->foo : null to isset()+get()+__get().
I think that's not true, but rather that the latter goes through zend_std_read_property
once the properties is already set. With the behavior @nicolas-grekas suggests, only __isset
would be called, which would be consistent across both examples (as __isset
creates the property).
@nicolas-grekas Can you clarify whether my suggestion above solves your problem?
Oh, the mesh is really worse :(
<?php
class Foo {
public $b = null;
function __isset($n) {
echo "__isset()\n";
return true;
}
function __get($n) {
echo "__get()\n";
return null;
}
}
$x = new Foo();
var_dump($x->a ?? 5);
var_dump(isset($x->a) ? $x->a : 5);
var_dump(isset($x->b) ? $x->b : 5);
?>
__isset()
__get()
int(5)
__isset()
__get()
NULL
int(5)
isset()
on "magic" property doesn't check for NULL.
It seems like behavior of $a->x ?? y
and isset($a->x) ? $a->x : y
in conjunction with magic properties is really different and inconsistent.
isset($a->x)
doesn't call __get() to check for NULL
, but $a->x ??
and isset($a->x[0])
do.
This issue is a non-direct side effect of this inconsistency. I think we first have to decide what to do with the original inconsistency and document the existing and desired behavior. Any thoughts?
whether my suggestion above solves your problem?
I didn't run the code but using new ReflectionProperty($this, 'data'))->isInitialized($this)
should allow working around the issue yes. I'm definitely going to implement a workaround using your suggestion. I just hope that's not going to be the recommended long-term solution because that's a quite high performance tax to work around a bug (double magic method call + reflection check while a single magic method call should be needed).
isset() on "magic" property doesn't check for NULL.
Good catch I missed this behavior. That's another angle to the same issue!
The RFC that introduces ??
clearly explains that ??
should be equivalent to isset + ternary.
The current behavior of magic methods is for sure an oversight. Also, nobody can build anything useful on the current behavior. It's just breaking that equivalence described in the RFC and the principle of least surprise at the same time. I'm firmly advocating for fixing this inconsistency as you understood already :)
Tbf this call is only made the first time a lazy object is initialized via ??
(or after unsetting the prop again), so likely not a big performance problem at all.
this call is only made the first time a lazy object is initialized
There's another kind of lazy proxy where the lazy initialization triggers the initialization of another internal instance, and all property access on the proxy are forwarded to that instance all the time. This already creates a measurable performance overhead, and working around this issue will makes it worse. (I have plans to remove this overhead but that's not directly related so I won't expand here).
The RFC that introduces
??
clearly explains that??
should be equivalent to isset + ternary.
Yeah, but it also requires check for NULL. :( I suppose this inconsistency was introduced because "??" RFC just didn't take magic properties into account and the behavior was defined by the implementation details.
There are two ways to resolve this.
1) Always trust to isset() and never check return value of get() for NULL. This will allow $a->x ??
with magic property to return NULL. It may be not easy to implement this...
2) Make isset() call get() and check it's return value for NULL even if isset() returns true. This will cause additional __get() call for each isset() with magic property.
Both ways introduce BC breaks.
The current behavior of magic methods is for sure an oversight. Also, nobody can build anything useful on the current behavior. It's just breaking that equivalence described in the RFC and the principle of least surprise at the same time. I'm firmly advocating for fixing this inconsistency as you understood already :)
Now we both understood the problem better, when at the start of the discussion :)
Make isset() call get() and check it's return value for NULL even if isset() returns true.
This has my preference since it'd be unexpected to have $a->x ?? 5
return null
(static analyzers would get mad at this :) )
This will cause additional __get() call for each isset() with magic property.
Can't we reuse the result from the first call to skip doing the second? If not, then my preference goes for solution 1. (always trust to isset()) and state that if this happens, then the bug is in the implementation of `isset()`.
Both ways introduce BC breaks.
This could be said about any bugfix :) In this case, I'd advocate this is just a bug that needs to be fixed. As I mentioned in a previous message, there's nothing to build on the current behavior but unexpected bugs ;)
Can't we reuse the result from the first call to skip doing the second?
Not trivially no, as in without changing opcodes.
In this case ISSET_ISEMPTY_PROP_OBJ
needs to know whether the value is null
. However, even if it called __get
and checked, the JMPZ
consumes that value. Furthermore, the value might not be used inside the ternary at all, or used twice. I would consider returning true
in isset
and then a null
value a bug in the userland implementation. I'm not sure making the situation more complex is worthwhile.
Make isset() call get() and check it's return value for NULL even if isset() returns true.
This has my preference since it'd be unexpected to have
$a->x ?? 5
returnnull
(static analyzers would get mad at this :)
Agree.
This will cause additional __get() call for each isset() with magic property.
Can't we reuse the result from the first call to skip doing the second?
Unfortunately, no.
If not, then my preference goes for solution 1. (always trust to isset()) and state that if this happens, then the bug is in the implementation of `isset()`.
May be we can find some other way or compromise. Ideally we shouldn't introduce isset() and simple relay on get() return value. Now this is going to be a huge break.
Ideally we shouldn't introduce isset() and simple relay on get() return value. Now this is going to be a huge break.
yeah, this one is risky I think as it might bypass measures that ppl put in isset (eg throwing an exception from there)
Just to be sure we're on par: do we want to fix this as a bug or do we want to go through an RFC?
Just to be sure we're on par: do we want to fix this as a bug or do we want to go through an RFC?
I'm not sure. This is not a simple bug and the behavior change can't be committed into the old branches. https://github.com/php/php-src/pull/12698 doesn't fix the general inconsistency. It may be committed to master only, but this won't help a lot. I don't see a good solution for fixing the inconsistency and most probably it's not going to be found soon. I would prefer to keep this as is, instead of introducing more mess, but I may be wrong.
I will also note, that ??
is not equivalent to isset()
and the ternary operator for when one tries to access invalid string offsets (related to #7173) https://3v4l.org/aDLLB
@Girgias you're right but this also spans to other operators, eg &=
. While it could be expected to have these work on string offsets seamlessly, that's not the case, and I would consider this a separate topic.
In my opinion x ?? y
should be strictly equivalent to isset(x) ? x : y
. This is how it is documented, RFC-accepted and how it is the easiest to understand.
@nicolas-grekas Are you planning on moving this discussion forward? Or did you find an acceptable solution for your use-case?
Yes, I'd like this to be fixed so I'll raise the issue ASAP on internals, thanks for the reminder.
@nicolas-grekas Is this still relevant with the introduction of lazy objects?
Yes it is.
But I'm also thinking about an RFC that'd add an __exists
magic method, that'd help resolve the ambiguity we have between null and unset properties at the moment.
I'd be fine to properly define the behavior of ?
-operators when this method exists, and leaving the current behavior as is otherwise. I don't understand why we'd need to preserve BC for a broken behavior but at least that's a plan forward ;)
I don't really care about BC here, these are extreme edge cases. But object handlers are already very complex, things like this don't make them simpler.
Yes it is.
Can you clarify why?
Nothing fancy - I just mean the initial report: The sequence is incorrect currently. I'm not going to fall into it anymore with lazy-objects but that still remains incorrect behavior :)
Thank you for the clarification. I'm going to close my PR in favor of lazy objects then. I'm not sure if the current behavior can be classified as "wrong", but with an available alternative now, it doesn't seem necessary to modify it.
Description
When
??
is used to check a property using magic methods, PHP unconditionnaly calls both isset and get even when __isset did actually set the property.Reproducer: https://3v4l.org/cDlNW
This script:
echoes this:
while it should echo this:
PHP Version
All since 7.0.6
Operating System
No response