php / php-src

The PHP Interpreter
https://www.php.net
Other
37.95k stars 7.73k forks source link

Mixing processing instructions and element nodes gives inconsistent results in SimpleXML #12168

Open nielsdos opened 1 year ago

nielsdos commented 1 year ago

Description

The following code:

<?php

$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<container>
    <x/><?hello world?>
</container>
XML;

$sxe = simplexml_load_string($xml);

var_dump($sxe->children());

// x and hello swapped
$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<container>
    <?hello world?><x/>
</container>
XML;

$sxe = simplexml_load_string($xml);

var_dump($sxe->children());

Resulted in this output:

object(SimpleXMLElement)#2 (2) {
  ["x"]=>
  object(SimpleXMLElement)#4 (0) {
  }
  ["hello"]=>
  object(SimpleXMLElement)#5 (0) {
  }
}
object(SimpleXMLElement)#1 (1) {
  ["x"]=>
  object(SimpleXMLElement)#5 (0) {
  }
}

But I expected this output instead:

object(SimpleXMLElement)#2 (2) {
  ["x"]=>
  object(SimpleXMLElement)#4 (0) {
  }
  ["hello"]=>
  object(SimpleXMLElement)#5 (0) {
  }
}
object(SimpleXMLElement)#1 (1) {
  ["hello"]=>
  object(SimpleXMLElement)#%d (0) {
  }
  ["x"]=>
  object(SimpleXMLElement)#%d (0) {
  }
}

PHP Version

PHP 8.1+

Operating System

Linux

nielsdos commented 1 year ago

This is because php_sxe_iterator_fetch lets the iterator point at the first element node. That's right for SXE_ITER_ELEMENT but not for SXE_ITER_NONE, especially considering that forwards iterating from that point does not skip PIs.

EDIT: and so, this problem also occurs with comments... (https://bugs.php.net/bug.php?id=43542 looks somewhat related in that it points out a similar inconsistency wrt iterators & nodes)