sissaschool / elementpath

XPath 1.0/2.0/3.0/3.1 parsers and selectors for ElementTree and lxml
MIT License
72 stars 20 forks source link

XPath fails with version 4.5 and succeeds with 4.1.5. #77

Closed sdkeens closed 1 month ago

sdkeens commented 1 month ago

I have found an XPath that succeeds with elementpath version 4.1.5 and fails with version 4.5.

The pyunit.zip file contains a python unit test and an XML file that I used to test the two versions of elementpath.

pyunit.zip

For you information, here's the list of packages I have installed:

alabaster==0.7.13
astroid==3.0.2
  typing_extensions==4.10.0
Babel==2.12.1
  pytz==2023.3
boto3==1.28.0
  botocore==1.31.0
    jmespath==1.0.1
    python-dateutil==2.8.2
      six==1.16.0
    urllib3==1.26.16
  jmespath==1.0.1
  s3transfer==0.6.1
    botocore==1.31.0
      jmespath==1.0.1
      python-dateutil==2.8.2
        six==1.16.0
      urllib3==1.26.16
botocore==1.31.0
  jmespath==1.0.1
  python-dateutil==2.8.2
    six==1.16.0
  urllib3==1.26.16
build==1.2.2.post1
  colorama==0.4.6
  importlib-metadata==6.7.0
    zipp==3.15.0
  packaging==23.1
  pyproject_hooks==1.0.0
    tomli==2.0.1
  tomli==2.0.1
certifi==2023.5.7
charset-normalizer==3.1.0
click==8.1.4
  colorama==0.4.6
colorama==0.4.6
decorator==5.1.1
dill==0.3.7
distlib==0.3.8
docutils==0.19
elementpath==4.5.0
et-xmlfile==1.1.0
exceptiongroup==1.2.2
filelock==3.13.1
geojson==3.0.1
geomet==1.0.0
  click==8.1.4
    colorama==0.4.6
  six==1.16.0
greenlet==3.0.3
idna==3.4
imagesize==1.4.1
importlib-metadata==6.7.0
  zipp==3.15.0
iniconfig==2.0.0
isort==5.13.2
Jinja2==3.1.2
  MarkupSafe==2.1.3
jmespath==1.0.1
jproperties==2.1.1
  six==1.16.0
jsonpath-ng==1.6.1
  ply==3.11
lxml==4.9.4
MarkupSafe==2.1.3
mccabe==0.7.0
mock==5.0.2
mypy==1.4.1
  mypy-extensions==1.0.0
  tomli==2.0.1
  typing_extensions==4.10.0
mypy-extensions==1.0.0
numpy==1.24.4
openpyxl==3.1.2
  et-xmlfile==1.1.0
packaging==23.1
pip==23.3.2
pipdeptree==2.23.1
  packaging==23.1
  pip==23.3.2
platformdirs==4.1.0
pluggy==1.5.0
ply==3.11
psycopg2==2.9.6
psycopg2-binary==2.9.9
pycryptodome==3.18.0
Pygments==2.15.1
pylint==3.0.3
  astroid==3.0.2
    typing_extensions==4.10.0
  colorama==0.4.6
  dill==0.3.7
  isort==5.13.2
  mccabe==0.7.0
  platformdirs==4.1.0
  tomli==2.0.1
  tomlkit==0.12.3
  typing_extensions==4.10.0
pyparsing==3.0.9
pyproject_hooks==1.0.0
  tomli==2.0.1
pytest==8.3.2
  colorama==0.4.6
  exceptiongroup==1.2.2
  iniconfig==2.0.0
  packaging==23.1
  pluggy==1.5.0
  tomli==2.0.1
python-dateutil==2.8.2
  six==1.16.0
pytz==2023.3
pywin32==306
PyYAML==6.0
ratelimit==2.2.1
requests==2.31.0
  certifi==2023.5.7
  charset-normalizer==3.1.0
  idna==3.4
  urllib3==1.26.16
requests-mock==1.11.0
  requests==2.31.0
    certifi==2023.5.7
    charset-normalizer==3.1.0
    idna==3.4
    urllib3==1.26.16
  six==1.16.0
retrying==1.3.4
  six==1.16.0
s3path==0.4.2
  boto3==1.28.0
    botocore==1.31.0
      jmespath==1.0.1
      python-dateutil==2.8.2
        six==1.16.0
      urllib3==1.26.16
    jmespath==1.0.1
    s3transfer==0.6.1
      botocore==1.31.0
        jmespath==1.0.1
        python-dateutil==2.8.2
          six==1.16.0
        urllib3==1.26.16
  packaging==23.1
  smart-open==6.3.0
s3transfer==0.6.1
  botocore==1.31.0
    jmespath==1.0.1
    python-dateutil==2.8.2
      six==1.16.0
    urllib3==1.26.16
semantic-version==2.10.0
setuptools==65.6.3
shapely==2.0.4
  numpy==1.24.4
six==1.16.0
smart-open==6.3.0
snowballstemmer==2.2.0
Sphinx==6.2.1
  alabaster==0.7.13
  Babel==2.12.1
    pytz==2023.3
  colorama==0.4.6
  docutils==0.19
  imagesize==1.4.1
  importlib-metadata==6.7.0
    zipp==3.15.0
  Jinja2==3.1.2
    MarkupSafe==2.1.3
  packaging==23.1
  Pygments==2.15.1
  requests==2.31.0
    certifi==2023.5.7
    charset-normalizer==3.1.0
    idna==3.4
    urllib3==1.26.16
  snowballstemmer==2.2.0
  sphinxcontrib-applehelp==1.0.4
  sphinxcontrib-devhelp==1.0.2
  sphinxcontrib-htmlhelp==2.0.1
  sphinxcontrib-jsmath==1.0.1
  sphinxcontrib-qthelp==1.0.3
  sphinxcontrib-serializinghtml==1.1.5
sphinxcontrib-applehelp==1.0.4
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
spid==2223.0.4
  elementpath==4.5.0
  jsonpath-ng==1.6.1
    ply==3.11
  lxml==4.9.4
  SQLAlchemy==2.0.29
    greenlet==3.0.3
    typing_extensions==4.10.0
SQLAlchemy==2.0.29
  greenlet==3.0.3
  typing_extensions==4.10.0
sqlcipher3 @ file:///C:/home/nonpciroot/python/python3.8/windows10/sqlcipher3-0.4.8-cp38-cp38-win_amd64.whl#sha256=d80d3321019f2c50e9f8328e36917b87d75c4eddf0c92b2e6fff618d5f746701
tomli==2.0.1
tomlkit==0.12.3
tqdm==4.65.0
  colorama==0.4.6
typing_extensions==4.10.0
urllib3==1.26.16
virtualenv==20.25.0
  distlib==0.3.8
  filelock==3.13.1
  platformdirs==4.1.0
wheel==0.44.0
winshell==0.6
zipp==3.15.0
sdkeens commented 1 month ago

Note that I have also tested the XML file and the XPath with the online XPath tester: https://www.freeformatter.com/xpath-tester.html It returns the expected value.

sdkeens commented 1 month ago

I forgot to mention:

  1. This occurs on my computer running Windows 11. I have not tried it on other platforms.
  2. I am using Python 3.8.
brunato commented 1 month ago

Hi, the point is the same that i described in my last comment here: https://github.com/sissaschool/elementpath/issues/72#issuecomment-2375146627.

So it was a bug in v4.4 that erroneously usse the namespaces of the instance (an lxml's tree ...) to resolve the prefixes of the XPath expression. Also in this case with the default namespace. XPath explicitly states that the namespaces map must be provided to the static context. If not the empty prefix must be mapped to no namespace.

I could left this automatism for lxml but, apart the consideration on different results comparing to ElementTree, the fact is that also lxml doesn't consider the instance namespaces for parsing the path when one use the xpath() API. For proving that you have to try apply xml_et_root.xpath(xpath), that returns the empty list.

So the solution is to provide a mapping for the empty namespace:

namespaces = {'': "http://www.pcigeomatics.com/xmlschema/gdbmetadata/1.0.0"}
xpath = '/GDBFile/Imagery/Channels/Channel[@id=1]/DataType'
xpath_result = elementpath.select(xml_et_root, xpath,
                                  parser=XPath3Parser,
                                  namespaces=namespaces,
                                  default_collation=elementpath.collations.UNICODE_CODEPOINT_COLLATION)

Best regards

sdkeens commented 1 month ago

Hi Brunato,

Thanks for the information. I will close this issue.