Error in getting attribute value

sissaschool / elementpath

XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml

MIT License

72 stars 20 forks source link

Error in getting attribute value #58

Closed RabbitJackTrade closed 1 year ago

RabbitJackTrade commented 1 year ago

Using elementpath-4.0.1 and Python 3.10.9 in Jupyter:

import xml.etree.ElementTree as ET
from elementpath import select

tei = """<?xml version='1.0' encoding='UTF8'?>
<?xml-model type="application/xml"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <text>
        <pb n="page1"/>          
        <pb n="page2"/>
  </text>
</TEI>
"""

doc= etree.XML(tei.encode())
for p in select(doc,'//pb'):
    print(p.attrib['n'])
    print(p.xpath('./@n')[0]) #or  print(p.xpath('@n')[0])
    print(select(doc,'./@n')) # or print(select(doc,'@n'))
    print('------')

The output returned is:

page1
page1
[]
------
page2
page2
[]
------

brunato commented 1 year ago

Hi, lxml's xpath lacks of document position (try '/' expression on an Element or an ElementTree instance).

In elementpath the document position is considered also if you provide an Element instead of an ElementTree instance (currently setting the context item with None, an alternative could be creating a dummy ElementTree instance that wraps the root element ...).

So in you example the select call have to provide the starting context item:

select(doc, './@n', item=p)

RabbitJackTrade commented 1 year ago

First, thanks - as usual. It works now!

Second, is that point mentioned in the documentation anywhere? If not, should it be added?

brunato commented 1 year ago

Second, is that point mentioned in the documentation anywhere? If not, should it be added?

There is not a mention about that. I will add a paragraph about concepts and implementation details of XPath selectors.

RabbitJackTrade commented 1 year ago

Great. Thanks again.

brunato commented 1 year ago

More on this: the unexpected behavior is generated by the root sibling PI <?xml-model type="application/xml"?>.

In order to preserve this a dummy document node is created and so the root node for XPath selection is the root document, also if you provide select(p, './@n').

This not happens if you use xml.etree.ElementTree because it doesn't parse root siblings.

I can change this behavior for lxml with one of these:

Do nothing (discard root siblings if an Element is provided for root argument)
Create a dummy document node only if the root node has siblings and the provided Element is the root element of the tree

The first option is like the xml.etree.ElementTree behavior. The current option is to create a dummy document node only if the root node has siblings. If the provided root is an ElementTree instance the document node is created in any case.

brunato commented 1 year ago

I'm opting for creating a document node only if the provided root Element is the root of the tree. This is coherent with the resolution of #54.

brunato commented 1 year ago

Hi @RabbitJackTrade,

added a section for advanced topics into documentation. If you want to add more or fix some parts of this feel free to make a PR.

thanks

RabbitJackTrade commented 1 year ago

Thanks, Davide; I’ll be happy to take a look.

Sent from Mailhttps://go.microsoft.com/fwlink/?LinkId=550986 for Windows

From: Davide @.> Sent: Tuesday, March 21, 2023 12:31 PM To: @.> Cc: @.>; @.> Subject: Re: [sissaschool/elementpath] Error in getting attribute value (Issue #58)

Hi @RabbitJackTradehttps://github.com/RabbitJackTrade,

added a section for advanced topics into documentation. If you want to add more or fix some parts of this feel free to make a PR.

thanks

— Reply to this email directly, view it on GitHubhttps://github.com/sissaschool/elementpath/issues/58#issuecomment-1478180831, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJPGPTAUNXKUZLGZO2AUWM3W5HJXTANCNFSM6AAAAAAVASIWAY. You are receiving this because you were mentioned.Message ID: @.***>