sissaschool / elementpath

XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml
MIT License
72 stars 20 forks source link

node()[1]/following-sibling::node() doesn't select nodes with elementpath while lxml.xpath finds them #44

Closed martin-honnen closed 2 years ago

martin-honnen commented 2 years ago

Given

from lxml import etree as ET
lxml_root1 = ET.fromstring('<root>text 1<!-- comment -->text 2<!-- comment --> text 3</root>')

the built in lxml XPath 1.0 implementation gives the result [<!-- comment -->, 'text 2', <!-- comment -->, ' text 3'] for the call lxml_root1.xpath('node()[1]/following-sibling::node()'). elementpath returns an empty list [] for elementpath.select(lxml_root1, 'node()[1]/following-sibling::node()').

brunato commented 2 years ago

Found the problem, in XPathContext.iter_siblings() a check of initial item as a TextNode is needed and also additional checks on text and tails. So the final version of this generator function is:

    def iter_siblings(self, axis: Optional[str] = None) \
            -> Iterator[Union[ElementNode, TextNode]]:
        """
        Iterator for 'following-sibling' forward axis and 'preceding-sibling' reverse axis.

        :param axis: the context axis, default is 'following-sibling'.
        """
        item: Union[TextNode, ElementProtocol]

        if isinstance(self.item, TypedElement):
            item = self.item.elem
            parent = self.get_parent(item)
        elif isinstance(self.item, TextNode):
            item = self.item
            parent = item.parent
        elif not is_etree_element(self.item) or callable(getattr(self.item, 'tag')):
            return
        else:
            item = cast(ElementNode, self.item)
            parent = self.get_parent(item)

        if parent is None:
            return

        status = self.item, self.axis
        self.axis = axis or 'following-sibling'

        if axis == 'preceding-sibling':
            if is_element_node(parent):
                elem = cast(ElementNode, parent)
                if elem.text is not None:
                    self.item = TextNode(elem.text, elem)
                    if self.item == item:
                        self.item, self.axis = status
                        return
                    yield self.item

            for child in parent:  # pragma: no cover
                if child is item:
                    break
                self.item = child
                yield child
                if child.tail is not None:
                    self.item = TextNode(child.tail, child, True)
                    if self.item == item:
                        break
                    yield self.item
        else:
            follows = False
            if is_element_node(parent):
                elem = cast(ElementNode, parent)
                if elem.text is not None and item == TextNode(elem.text, elem):
                    follows = True

            for child in parent:
                if follows:
                    self.item = child
                    yield child
                    if child.tail is not None:
                        self.item = TextNode(child.tail, child, True)
                        yield self.item
                elif child is item:
                    follows = True
                    if child.tail is not None:
                        self.item = TextNode(child.tail, child, True)
                        yield self.item
                elif child.tail is not None and item == TextNode(child.tail, child, True):
                    follows = True

        self.item, self.axis = status

In next major release i will refactor all context processing using a full XPath node types implementation. I'm on the final stages for this. The nodes are wrappers also for etree element and are build at context initialization. This will simplify node ordering and axis processing.

Thank you

brunato commented 2 years ago

Release v2.5.2 published, i close this and #42.