Open Gallaecio opened 5 years ago
Hey @Gallaecio , I'd also want to see this.
Also, I believe the issue is with lxml
and not libxml2
(and not parsel either): lxml
text nodes do not accept further XPath calls (you can only call .getparent()
on the "smart strings" results -- note that "smart_strings" are disabled by default in parsel), while libxml2
allows XPath operations on text nodes:
>>> import libxml2
>>> doc = libxml2.htmlParseDoc('''<html>
... <head>
... <meta charset="UTF-8">
... <title>Title of the document</title>
... </head>
...
... <body>
... Content of the document......
... </body>
...
... </html>''', 'ascii')
>>> doc
<xmlDoc (None) object at 0x7ff070272680>
>>> ctxt = doc.xpathNewContext()
>>> res = ctxt.xpathEval("//text()")
>>> res
[<xmlNode (text) object at 0x7ff0702a2560>, <xmlNode (text) object at 0x7ff071d95320>]
>>> res[0].get_content()
'Title of the document'
>>> for t in res:
... print(t.xpathEval("parent::*"))
...
[<xmlNode (title) object at 0x7ff07025e7e8>]
[<xmlNode (body) object at 0x7ff07025e878>]
>>>
If you know Cython, it could be a nice addition to lxml
to support this
Given:
For text, you get:
However, regular elements work as you would expect:
I believe text elements should work the same. '.' should select them if they are the current element.