sissaschool / elementpath

XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml
MIT License
72 stars 20 forks source link

Select fails when XML contains leading comment #74

Closed sean-moore3 closed 3 months ago

sean-moore3 commented 3 months ago

Hi,

I stumbled upon this issue while upgrading from 4.1.5. I can reproduce in 4.2.0 and later.

import lxml.etree
import elementpath

root = lxml.etree.XML("<!--comment--><root><trunk><branch></branch></trunk></root>")
trunk = elementpath.select(root, "trunk")
print(trunk)
"""[]"""
root = lxml.etree.XML("<root><trunk><branch></branch></trunk></root>")
trunk = elementpath.select(root, "trunk")
print(trunk)
"""[<Element trunk at 0x102b862c0>]"""
brunato commented 3 months ago

Hi, with v4.2.0 something is changed with node tree build, see:

https://elementpath.readthedocs.io/en/latest/advanced.html#the-context-root-and-the-context-item

The change has been necessary to handle XML document and fragments. A root node without siblings can skip the document position if not explicitly selected by the XPath expression (e.g. /root/trunk). A comment sibling of the root element can't be ignored so the initial position is set to the document.

The keyword arguments item and fragment can be used to set the initial node or to skip the dummy document creation.

sean-moore3 commented 3 months ago

Thanks! I found fragment to be a good fit for my use case.