plone / diazo

Diazo applies a static HTML theme to a dynamic website
http://diazo.org
Other
41 stars 26 forks source link

ESI test fails with lxml 5 due to namespace changes #87

Closed mauritsvanrees closed 4 months ago

mauritsvanrees commented 4 months ago

In Plone 6.0 and 6.1 we still use lxml 4. When I update to lxml 5.1.0 one Diazo test fails:

Failure in test testAll (diazo.tests.test_diazo.Test-esi.testAll)
Traceback (most recent call last):
  File "/Users/maurits/.pyenv/versions/3.12.2/lib/python3.12/unittest/case.py", line 58, in testPartExecutor
    yield
  File "/Users/maurits/.pyenv/versions/3.12.2/lib/python3.12/unittest/case.py", line 634, in run
    self._callTestMethod(testMethod)
  File "/Users/maurits/.pyenv/versions/3.12.2/lib/python3.12/unittest/case.py", line 589, in _callTestMethod
    if method() is not None:
  File "/Users/maurits/community/plone-coredev/6.1/src/diazo/src/diazo/tests/test_diazo.py", line 241, in testAll
    assert self.themed_content.xpath(
AssertionError: /Users/maurits/community/plone-coredev/6.1/src/diazo/src/diazo/tests/esi/xpaths.txt: /html/body/include

So it fails here while using the files from the esi directory.

Maybe this hint in the tests about switching to "Xpath default namespaces" could help.

The probably relevant part of the lxml changelog under 5.0.0:

With libxml2 2.10.4 and later (as provided by the lxml 5.0 binary wheels), parsing HTML tags with "prefixes" no longer builds a namespace dictionary in nsmap but considers the prefix:name string the actual tag name. With older libxml2 versions, since 2.9.11, the prefix was removed. Before that, the prefix was parsed as XML prefix. lxml 5.0 does not try to hide this difference but now changes the ElementPath implementation to let element.find("part1:part2") search for the tag part1:part2 in documents parsed as HTML, instead of looking only for part2.

lrowe commented 4 months ago

From that changelog it seems that the output document when parsed with the html parser no longer strips the prefix but includes it in the tag name. I'm not sure whether it's possible to escape the : character to write an xpath for it, but then it would stop working with lxml 4. Probably easiest to just delete the xpaths.txt file since we test the output anyway.

https://github.com/plone/diazo/blob/83d38a6dd799f9cf534718add1b8dc78e500bd9e/src/diazo/tests/esi/xpaths.txt#L1-L2