php / php-src

The PHP Interpreter
https://www.php.net
Other
37.94k stars 7.73k forks source link

Readonly liskov violations in new Dom API #15578

Open jnvsor opened 2 weeks ago

jnvsor commented 2 weeks ago

Description

The following code:

<?php
$xml = <<<END
<?xml version="1.0" encoding="UTF-8"?>
<root>
    <child />
</root>
END;

$doc = Dom\XMLDocument::createFromString($xml);
$root = $doc->firstChild;
$text = $root->firstChild;
$child = $root->firstElementChild;

try {
    $text->nodeValue = 'Hello';
} catch (Throwable $t) {
    var_dump($t->getMessage());
}

try {
    $child->nodeValue = 'Hello';
} catch (Throwable $t) {
    var_dump($t->getMessage());
}

try {
    $doc->nodeValue = 'Hello';
} catch (Throwable $t) {
    var_dump($t->getMessage());
}

try {
    $text->textContent = 'Hello';
} catch (Throwable $t) {
    var_dump($t->getMessage());
}

try {
    $child->textContent = 'Hello';
} catch (Throwable $t) {
    var_dump($t->getMessage());
}

try {
    $doc->textContent = 'Hello';
} catch (Throwable $t) {
    var_dump($t->getMessage());
}

Resulted in this output:

string(55) "Cannot modify readonly property Dom\Element::$nodeValue"
string(59) "Cannot modify readonly property Dom\XMLDocument::$nodeValue"
string(61) "Cannot modify readonly property Dom\XMLDocument::$textContent"

But I expected this output instead:

In user code this would result in:

Fatal error: Cannot redeclare non-readonly property X::$y as readonly Y::$y

Since I've found multiple of these exploring the new Dom API it probably needs a closer look.

Tested in both docker (php:8.4-rc-alpine version 8.4.0beta3) and 3v4l

PHP Version

8.4.0beta3

Operating System

No response

cmb69 commented 2 weeks ago

At least part of the problem is that these are not readonly properties, but rather more like getter hooks (see #15553).

nielsdos commented 2 weeks ago

That would indeed be allowed if following hook semantics: https://3v4l.org/b2LS3/rfc#vgit.master Note that it would also complain about being read-only, but this is different from the readonly keyword. There was also some discussion in the past of using @readonly and it was deemed correct at that time, although this predates hooks ofc: https://github.com/php/php-src/pull/11677#discussion_r1264876126

In any case, the behaviour is correct according to spec. If we take a look at $nodeValue for example: https://dom.spec.whatwg.org/#dom-node-nodevalue. We can see that only 2 subclasses have a writable $nodeValue property while for the others it's readonly.

So the way to move forward with this is by doing the following 2 changes:

jnvsor commented 2 weeks ago

That would indeed be allowed if following hook semantics: https://3v4l.org/b2LS3/rfc#vgit.master

@nielsdos You're setting on the parent class not the child, so the hook semantics are adhering to liskov: https://3v4l.org/t5ebW/rfc#vgit.master

Although the empty setter makes it a no-op, it went from readonly in the parent to writable in the child. While that's not possible with the readonly keyword (It probably could be if someone wanted to put in the work) it still ensures that the parent's api isn't broken by the child.

If we flip them around to be analogous the child just uses the parent's setter: https://3v4l.org/8ig6M/rfc#vgit.master

I don't know what the right solution is but I suspect it'd be a dom\domexception

Edit: Actually isn't that the solution? Just make nodeValue and textContent readonly on Node and make them not readonly on only the children where they can be written to? A bit verbose but it would be correct

nielsdos commented 2 weeks ago

I believe we're just saying the same thing but in different words.

Actually isn't that the solution? Just make nodeValue and textContent readonly on Node and make them not readonly on only the children where they can be written to? A bit verbose but it would be correct

That's how it's implemented right now. It's just the "readonly annotation that's not actually the same as the readonly keyword" that causes confusion.

jnvsor commented 2 weeks ago

That's how it's implemented right now. It's just the "readonly annotation that's not actually the same as the readonly keyword" that causes confusion.

In that case yes, it's a documentation error. The part that confused me is that it's not marked readonly on Node: https://github.com/php/php-src/blob/b9b317afd4602cbe67ca861702c65038c32f0d4e/ext/dom/node.c#L136

If you could mark it readonly on Node and remove the readonly on that property in the Attr and CharacterData subclasses that would probably suffice, but I don't know how hard it is to do changes like that in subclasses in the docs.

If the docs are getting a way to render property hooks then this is probably a good example of where to use them

nielsdos commented 2 weeks ago

The comments in the C source code don't have any influence on the semantics or the docs. The docs are rendered from the stub file. I think those readonly comments should be removed anyway as they are from legacy DOM and are confusing.