plone / plone.protect

HTTP protection utilities for the Plone CMS
https://pypi.org/project/plone.protect/
7 stars 8 forks source link

AttributeError on plone.transformchain #64

Closed hvelarde closed 6 years ago

hvelarde commented 7 years ago

Plone 4.3.11 with plone4.csrffixes 1.1 and plone.protect 3.0.19; there are a lot of messages in like this one in the event log:

2017-08-02T11:15:27 ERROR plone.transformchain Unexpected error whilst trying to apply transform chain
Traceback (most recent call last):
  File "/opt/plone/buildout/eggs/plone.transformchain-1.2.0-py2.7.egg/plone/transformchain/transformer.py", line 49, in __call__
    newResult = handler.transformIterable(result, encoding)
  File "/opt/plone/buildout/eggs/plone4.csrffixes-1.1-py2.7.egg/plone4/csrffixes/transform.py", line 108, in transformIterable
    return self.transform(result, encoding)
  File "/opt/plone/buildout/eggs/plone4.csrffixes-1.1-py2.7.egg/plone4/csrffixes/transform.py", line 164, in transform
    result = self.parseTree(result, encoding)
  File "/opt/plone/buildout/eggs/plone.protect-3.0.19-py2.7.egg/plone/protect/auto.py", line 90, in parseTree
    result, pretty_print=False, encoding=encoding)
  File "/opt/plone/buildout/eggs/repoze.xmliter-0.6-py2.7.egg/repoze/xmliter/utils.py", line 32, in getHTMLSerializer
    doctype=doctype,
  File "/opt/plone/buildout/eggs/repoze.xmliter-0.6-py2.7.egg/repoze/xmliter/utils.py", line 19, in getXMLSerializer
    return XMLSerializer(root.getroottree(), serializer, pretty_print, doctype=doctype)
AttributeError: 'NoneType' object has no attribute 'getroottree'

we have no idea what is causing this and I see no relation with any specific piece of content.

the following hotfixes are also installed:

mauritsvanrees commented 7 years ago

We already catch TypeError and etree.ParseError here. It seems fine to add AttributeError there.

Can you maybe temporarily edit auto.py and catch AttributeError? Then the code will log a warning which includes the URL where this fails. That would be interesting.

hvelarde commented 7 years ago

thank you, very much, @mauritsvanrees; I did what you said:

2017-08-02T17:26:07 WARNING plone.protect error parsing dom, failure to add csrf token to response for url https://www.cartacapital.com.br/camara-decide-se-temer-pode-ser-julgado-pelo-stf-acompanhe/recent-updates

I'll make a PR adding AttributeError to the exception.

@rodfersou just to confirm the issue is in this view:

https://github.com/collective/collective.liveblog/blob/master/src/collective/liveblog/browser/recent_updates.py

rodfersou commented 7 years ago

@mauritsvanrees I tried to disable plone.protect at this view, but this don't make stop the exception https://github.com/collective/collective.liveblog/pull/39/files

My bet is that the parser is breaking when I do a return '' or because this view just return the <article> tag, what makes an invalid html https://github.com/collective/collective.liveblog/blob/master/src/collective/liveblog/browser/templates/recent_updates.pt

But first of all, I disabled it, and should never get into the parser.

Could you please help me understand what is going on here?

mauritsvanrees commented 7 years ago

You cannot completely disable plone.protect, you can only disable the csrf protection check: plone.protect then does not complain or redirect to the confirm-action page. But it still looks through your html for any forms or link tags where it injects an _authenticator value. That is where it goes wrong here.

While searching for the original error, I found a bug report in collective.liches which suggests this may happen when your content has only whitespace characters. Indeed I can confirm this:

>>> from repoze.xmliter.utils import getHTMLSerializer
>>> getHTMLSerializer('    ')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File ".../repoze.xmliter/repoze/xmliter/utils.py", line 32, in getHTMLSerializer
    doctype=doctype,
  File ".../repoze.xmliter/repoze/xmliter/utils.py", line 19, in getXMLSerializer
    return XMLSerializer(root.getroottree(), serializer, pretty_print, doctype=doctype)
AttributeError: 'NoneType' object has no attribute 'getroottree'

Can that be the case for you? A workaround would then be to check this in collective.liveblog and make sure to only return an empty string in that case. Well, then it may raise an XMLSyntaxError... Can you check if that helps at all?

I guess it would be fine to add a check for this in plone.protect: if the original is a string or unicode and is either empty or only contains white space, simply return the original.

rodfersou commented 7 years ago

@mauritsvanrees thank you for your feedback. The problem is exactaly what you said, in this case the html is: '\n\n \n \n\n\n'.

Fixed it in this PR.

@hvelarde can you please help me to fix the tests on Plone 5?

mauritsvanrees commented 6 years ago

With PR #65 merged last year, and with PR #69 merged just now, I think this bug is gone.