plone / Products.CMFPlone

The core of the Plone content management system
https://plone.org
GNU General Public License v2.0
249 stars 188 forks source link

Inconsistent Lexicon after mass plone.restapi import in 5.2RC5-pending #2905

Open zopyx opened 5 years ago

zopyx commented 5 years ago

We tested 52RC5-pending with a full migration of the UGent site (complete import over plone.restapi).

All events show up with an KeyError obviously due to some inconsistencies in portal_catalog or the Lexicon only.

...under further investigation

Traceback (innermost last):
  Module ZPublisher.WSGIPublisher, line 155, in transaction_pubevents
  Module ZPublisher.WSGIPublisher, line 337, in publish_module
  Module ZPublisher.WSGIPublisher, line 255, in publish
  Module ZPublisher.mapply, line 85, in mapply
  Module ZPublisher.WSGIPublisher, line 61, in call_object
  Module plone.app.event.browser.event_view, line 19, in __call__
  Module Products.Five.browser.pagetemplatefile, line 126, in __call__
  Module Products.Five.browser.pagetemplatefile, line 61, in __call__
  Module zope.pagetemplate.pagetemplate, line 135, in pt_render
  Module Products.PageTemplates.engine, line 88, in __call__
  Module z3c.pt.pagetemplate, line 173, in render
  Module chameleon.zpt.template, line 306, in render
  Module chameleon.template, line 187, in render
  Module 5c56cac85ba81e1afb6490cd9f53c0a8.py, line 454, in render
  Module b039035ce7814c9a5d3183e8daadeb8c.py, line 683, in render_master
  Module b039035ce7814c9a5d3183e8daadeb8c.py, line 1239, in render_content
  Module 5c56cac85ba81e1afb6490cd9f53c0a8.py, line 444, in __fill_content_core
  Module 5c56cac85ba81e1afb6490cd9f53c0a8.py, line 300, in render_content_core
  Module Products.PageTemplates.expression, line 101, in __call__
  Module zope.browserpage.simpleviewclass, line 41, in __call__
  Module Products.Five.browser.pagetemplatefile, line 126, in __call__
  Module Products.Five.browser.pagetemplatefile, line 61, in __call__
  Module zope.pagetemplate.pagetemplate, line 135, in pt_render
  Module Products.PageTemplates.engine, line 88, in __call__
  Module z3c.pt.pagetemplate, line 173, in render
  Module chameleon.zpt.template, line 306, in render
  Module chameleon.template, line 209, in render
  Module chameleon.utils, line 75, in raise_with_traceback
  Module chameleon.template, line 187, in render
  Module a31c90d9c6e12143dffaf4b657d8c4b8.py, line 480, in render
  Module Products.PageTemplates.expression, line 94, in __call__
  Module Products.PageTemplates.expression, line 65, in traverse
  Module zope.traversing.adapters, line 156, in traversePathElement
   - __traceback_info__: (<Products.Five.browser.metaconfigure.SimpleViewClass from /home/ajung/sandboxes/ugent-portaal-plone-4x/eggs/plone.app.event-3.2.4-py3.7.egg/plone/app/event/browser/event_summary.pt object at 0x7f36516a34e0>, 'num_more_occurrences')
  Module zope.traversing.adapters, line 47, in traverse
   - __traceback_info__: (<Products.Five.browser.metaconfigure.SimpleViewClass from /home/ajung/sandboxes/ugent-portaal-plone-4x/eggs/plone.app.event-3.2.4-py3.7.egg/plone/app/event/browser/event_summary.pt object at 0x7f36516a34e0>, 'num_more_occurrences', ())
  Module plone.memoize.view, line 48, in memogetter
  Module plone.app.event.browser.event_summary, line 75, in num_more_occurrences
  Module Products.ZCatalog.ZCatalog, line 555, in getIndexDataForRID
  Module Products.ZCatalog.Catalog, line 469, in getIndexDataForRID
  Module Products.ZCTextIndex.ZCTextIndex, line 220, in getEntryForObject
  Module Products.ZCTextIndex.ZCTextIndex, line 220, in <listcomp>
  Module Products.ZCTextIndex.Lexicon, line 98, in get_word
KeyError: 202924065

 - Expression: " view/num_more_occurrence"
 - Filename:   ... 3.2.4-py3.7.egg/plone/app/event/browser/event_summary.pt
 - Location:   (line 48: col 30)
 - Source:     all_occs view/num_more_occurrences"
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
 - Expression: "context/@@event_summary"
 - Filename:   ... nt-3.2.4-py3.7.egg/plone/app/event/browser/event_view.pt
 - Location:   (line 24: col 41)
 - Source:     ... ry replace="structure context/@@event_summary"/>
                                         ^^^^^^^^^^^^^^^^^^^^^^^
 - Expression: "provider:plone.abovecontentbody"
 - Filename:   ... egg/Products/CMFPlone/browser/templates/main_template.pt
 - Location:   (line 94: col 70)
 - Source:     ... 
                                     ^
 - Expression: "context/main_template/macros/master"
 - Filename:   ... nt-3.2.4-py3.7.egg/plone/app/event/browser/event_view.pt
 - Location:   (line 6: col 23)
 - Source:     ... etal:use-macro="context/main_template/macros/master"
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 - Arguments:  template: <ViewPageTemplateFile - at 0x7f36516a36d8>
               options: {...} (0)
               args: <tuple - at 0x7f36740e6048>
               nothing: <NoneType - at 0x5594bfdcf820>
               modules: <_SecureModuleImporter - at 0x7f366ab7d208>
               request: <WSGIRequest - at 0x7f365130da90>
               view: <SimpleViewClass from /home/ajung/sandboxes/ugent-portaal-plone-4x/eggs/plone.app.event-3.2.4-py3.7.egg/plone/app/event/browser/event_summary.pt event_summary at 0x7f36516a34e0>
               context: <ImplicitAcquisitionWrapper 2018-12-07-opstaan-met-katrien-boone at 0x7f3651e61ca8>
               views: <ViewMapper - at 0x7f36522afeb8>
               here: <ImplicitAcquisitionWrapper 2018-12-07-opstaan-met-katrien-boone at 0x7f3651e61ca8>
               container: <ImplicitAcquisitionWrapper 2018-12-07-opstaan-met-katrien-boone at 0x7f3651e61ca8>
               root: <ImplicitAcquisitionWrapper  at 0x7f3659ed7bd0>
               traverse_subpath: <list - at 0x7f36518041c8>
               user: <ImplicitAcquisitionWrapper - at 0x7f365236a558>
               default: <object - at 0x7f36740c1b30>
               repeat: {...} (0)
               loop: {...} (0)
               wrapped_repeat: <SafeMapping - at 0x7f3653960408>
               target_language: <NoneType - at 0x5594bfdcf820>
               translate: <function translate at 0x7f36514da8c8>
               data: <EventAccessor None at 0x7f36516a3198>
               excludes: <list - at 0x7f3653a30888>
               start_utcoffset: 0.0
               next_occs: <list - at 0x7f3652551748>
zopyx commented 5 years ago

Workaround to reindexing all fulltext indexes:

# post migration fixes for ZCatalog text indexes

import transaction
from zope.component.hooks import setSite
from Testing.makerequest import makerequest

indexes = ('SearchableText', 'Title', 'Description')

site = app.plone_portal
catalog = site.portal_catalog
setSite(site)
makerequest(site)

for lexicon in ('htmltext_lexicon', 'plaintext_lexicon', 'plone_lexicon'):
    print(f'Clearing {lexicon}')
    catalog[lexicon].clear()

for index in indexes:
    print(f'Clearing {index}')
    catalog._catalog.getIndex(index).clear()
    print(f'Reindexing {index}')
    catalog._catalog.reindexIndex(index, site.REQUEST)

print('Committing...')
transaction.commit()
pbauer commented 5 years ago

To me this looks more like a issue with the import rather than the catalog. Unless you can provide a way to reproduce that issue we'll not be able to help you.

zopyx commented 5 years ago

The only interaction with Plone is through plone.restapi by creating new objects. This happened twice with a 100.000 object import..this could be any client. There should not be any inconsistencies..and the mass import basically contains only standard core content objects.

tisto commented 5 years ago

Does that mean that plone.restapi might not properly reindex objects when importing?

zopyx commented 5 years ago

Does that mean that plone.restapi might not properly reindex objects when importing?

Perhaps - perhaps not. It could also be some kind of ZCatalog issues causing a corruption.

The fact is that I can reproduce it with our large migration import. Events are always broken in the same way. Error is triggerd by this num_more_occurences code. No visible corruption detected with standard plone search, live search - only Events are affects.

Perhaps related or not: Events require a notify() call

https://ploneappevent.readthedocs.io/en/latest/development.html?highlight=notify

but I don't see that could be related.

pbauer commented 5 years ago

I now have the same issue where SearchableText contains bytes instead of text. It only happens after migrating the Database to Python 3 using zodbupdate.

>>> from plone import api
>>> portal_catalog = api.portal.get_tool('portal_catalog')
>>> zctextindex = portal_catalog._catalog.indexes['SearchableText']
>>> opkapiindex = zctextindex.index
>>> values = opkapiindex._docwords.values()
>>> [i for i in values]

In Python 2 that is a list of bytes:

['\xe7\x03Ex\xe7\x03Ey\xe7\x03Eh\xe7\x03Ei\xe7\x03Ej\xe7\x03Ek\xe7\x03El\xe7\x03Em\xe7\x03En\xe7\x03Eo\xe7\x03Ep\xe7\x03Eq\xe7\x03Er\xe7\x03Es\xe7\x03Et\xe7\x03Eu\xe7\x03Ej\xe7\x03Ev\xe7\x03Ew\xe7\x03Ez\xe7\x03Em\xe7\x03E{\xe7\x03E|\xe7\x03E}\xe7\x03E~\xe7\x03E\x7f\xe7\x03F\x00\xe7\x03F\x01\xe7\x03F\x02\xe7\x03F\x03\xe7\x03F\x04\xe7\x03F\x05\xe7\x03F\x06\xe7\x03F\x07\xe7\x03F\x08\xe7\x03F\t\xe7\x03F\n\xe7\x03Ej\xe7\x03Ew\xe7\x03F\x0b\xe7\x03F\x0c\xe7\x03Em\xe7\x03F\r\xe7\x03Eo\xe7\x03Ej\xe7\x03F\x0e\xe7\x03F\x0f\xe7\x03F\x10\xe7\x03F\x06\xe7\x03F\x07\xe7\x03F\x08\xe7\x03F\x11\xe7\x03F\x12\xe7\x03Em\xe7\x03F\x13\xe7\x03Em\xe7\x03F\x14\xe7\x03F\x15\xe7\x03F\x16\xe7\x03F\x17\xe7\x03Ej\xe7\x03F\x08\xe7\x03F\x18\xe7\x03F\x19\xe7\x03F\x1a\xe7\x03Em\xe7\x03F\x0b\xe7\x03F\x1b\xe7\x03F\x1c\xe7\x03Em\xe7\x03F\x1d\xe7\x03F\x1e\xe7\x03F\x1f\xe7\x03F \xe7\x03Em\xe7\x03F!\xe7\x03F"\xe7\x03F#\xe7\x03F$\xe7\x03F%\xe7\x03F\x10\xe7\x03F&\xe7\x03F\'\xe7\x03F(\xe7\x03Em\xe7\x03F\x10\xe7\x03F)\xe7\x03F*\xe7\x03F+\xe7\x03F,\xe7\x03En\xe7\x03Ej\xe7\x03F-\xe7\x03F.\xe7\x03F/\xe7\x03F0\xe7\x03F1\xe7\x03F2\xe7\x03F3\xe7\x03F4\xe7\x03F5\xe7\x03F6\xe7\x03F7\xe7\x03F8\xe7\x03F9\xe7\x03F:\xe7\x03F;\xe7\x03F<\xe7\x03F=\xe7\x03F8\xe7\x03F>\xe7\x03Em\xe7\x03F?\xe7\x03F@\xe7\x03FA\xe7\x03FB\xe7\x03FC\xe7\x03F\x08\xe7\x03FD\xe7\x03F \xe7\x03Eo\xe7\x03FE\xe7\x03FF\xe7\x03FG\xe7\x03F6\xe7\x03FH\xe7\x03F3\xe7\x03Eo\xe7\x03F>\xe7\x03E\x7f\xe7\x03FI\xe7\x03F\x19\xe7\x03Em\xe7\x03F\x14\xe7\x03F\x15\xe7\x03Ej\xe7\x03F\x18\xe7\x03FJ\xe7\x03FK\xe7\x03FL\xe7\x03F\x00\xe7\x03FM\xe7\x03FN\xe7\x03Em\xe7\x03F@\xe7\x03F\x17\xe7\x03FO\xe7\x03Ej\xe7\x03FP\xe7\x03FQ\xe7\x03FR\xe7\x03FN\xe7\x03Em\xe7\x03F6\xe7\x03FS\xe7\x03FQ\xe7\x03FR\xe7\x03FT\xe7\x03FU\xe7\x03Em\xe7\x03F\x15\xe7\x03FV\xe7\x03F\x1c\xe7\x03Em\xe7\x03Ej\xe7\x03FW\xe7\x03FL\xe7\x03FX\xe7\x03F\x1c\xe7\x03Em\xe7\x03FY\xe7\x03FZ\xe7\x03F[\xe7\x03F\\\xe7\x03Ej\xe7\x03F\x0e',
 '\xe7\x03F^\xe7\x03F]\xe7\x03F]',
 '\xe7\x03F_\xe7\x03F]\xe7\x03F]',
 '\xe7\x03Fa\xe7\x03F`\xe7\x03F`',
 '\xe7\x03F_\xe7\x03F`\xe7\x03F`',
 '\xe7\x03Ff\xe7\x03Fb\xe7\x03Fc\xe7\x03FB\xe7\x03Fd\xe7\x03Fe\xe7\x03F\x05\xe7\x03Fb',
 '\xe7\x03Fg\xe7\x03Fg\xe7\x03Fh']

After migrating it still is a list of bytes:

[b'\xe7\x03Ex\xe7\x03Ey\xe7\x03Eh\xe7\x03Ei\xe7\x03Ej\xe7\x03Ek\xe7\x03El\xe7\x03Em\xe7\x03En\xe7\x03Eo\xe7\x03Ep\xe7\x03Eq\xe7\x03Er\xe7\x03Es\xe7\x03Et\xe7\x03Eu\xe7\x03Ej\xe7\x03Ev\xe7\x03Ew\xe7\x03Ez\xe7\x03Em\xe7\x03E{\xe7\x03E|\xe7\x03E}\xe7\x03E~\xe7\x03E\x7f\xe7\x03F\x00\xe7\x03F\x01\xe7\x03F\x02\xe7\x03F\x03\xe7\x03F\x04\xe7\x03F\x05\xe7\x03F\x06\xe7\x03F\x07\xe7\x03F\x08\xe7\x03F\t\xe7\x03F\n\xe7\x03Ej\xe7\x03Ew\xe7\x03F\x0b\xe7\x03F\x0c\xe7\x03Em\xe7\x03F\r\xe7\x03Eo\xe7\x03Ej\xe7\x03F\x0e\xe7\x03F\x0f\xe7\x03F\x10\xe7\x03F\x06\xe7\x03F\x07\xe7\x03F\x08\xe7\x03F\x11\xe7\x03F\x12\xe7\x03Em\xe7\x03F\x13\xe7\x03Em\xe7\x03F\x14\xe7\x03F\x15\xe7\x03F\x16\xe7\x03F\x17\xe7\x03Ej\xe7\x03F\x08\xe7\x03F\x18\xe7\x03F\x19\xe7\x03F\x1a\xe7\x03Em\xe7\x03F\x0b\xe7\x03F\x1b\xe7\x03F\x1c\xe7\x03Em\xe7\x03F\x1d\xe7\x03F\x1e\xe7\x03F\x1f\xe7\x03F \xe7\x03Em\xe7\x03F!\xe7\x03F"\xe7\x03F#\xe7\x03F$\xe7\x03F%\xe7\x03F\x10\xe7\x03F&\xe7\x03F\'\xe7\x03F(\xe7\x03Em\xe7\x03F\x10\xe7\x03F)\xe7\x03F*\xe7\x03F+\xe7\x03F,\xe7\x03En\xe7\x03Ej\xe7\x03F-\xe7\x03F.\xe7\x03F/\xe7\x03F0\xe7\x03F1\xe7\x03F2\xe7\x03F3\xe7\x03F4\xe7\x03F5\xe7\x03F6\xe7\x03F7\xe7\x03F8\xe7\x03F9\xe7\x03F:\xe7\x03F;\xe7\x03F<\xe7\x03F=\xe7\x03F8\xe7\x03F>\xe7\x03Em\xe7\x03F?\xe7\x03F@\xe7\x03FA\xe7\x03FB\xe7\x03FC\xe7\x03F\x08\xe7\x03FD\xe7\x03F \xe7\x03Eo\xe7\x03FE\xe7\x03FF\xe7\x03FG\xe7\x03F6\xe7\x03FH\xe7\x03F3\xe7\x03Eo\xe7\x03F>\xe7\x03E\x7f\xe7\x03FI\xe7\x03F\x19\xe7\x03Em\xe7\x03F\x14\xe7\x03F\x15\xe7\x03Ej\xe7\x03F\x18\xe7\x03FJ\xe7\x03FK\xe7\x03FL\xe7\x03F\x00\xe7\x03FM\xe7\x03FN\xe7\x03Em\xe7\x03F@\xe7\x03F\x17\xe7\x03FO\xe7\x03Ej\xe7\x03FP\xe7\x03FQ\xe7\x03FR\xe7\x03FN\xe7\x03Em\xe7\x03F6\xe7\x03FS\xe7\x03FQ\xe7\x03FR\xe7\x03FT\xe7\x03FU\xe7\x03Em\xe7\x03F\x15\xe7\x03FV\xe7\x03F\x1c\xe7\x03Em\xe7\x03Ej\xe7\x03FW\xe7\x03FL\xe7\x03FX\xe7\x03F\x1c\xe7\x03Em\xe7\x03FY\xe7\x03FZ\xe7\x03F[\xe7\x03F\\\xe7\x03Ej\xe7\x03F\x0e',
 b'\xe7\x03F^\xe7\x03F]\xe7\x03F]',
 b'\xe7\x03F_\xe7\x03F]\xe7\x03F]',
 b'\xe7\x03Fa\xe7\x03F`\xe7\x03F`',
 b'\xe7\x03F_\xe7\x03F`\xe7\x03F`',
 b'\xe7\x03Ff\xe7\x03Fb\xe7\x03Fc\xe7\x03FB\xe7\x03Fd\xe7\x03Fe\xe7\x03F\x05\xe7\x03Fb',
 b'\xe7\x03Fg\xe7\x03Fg\xe7\x03Fh']

Instead it should be a list of text.

I'm currently not sure how to fix this issue during running zodbupdate since the items in question are inside a IOBTree. Maybe @davisagli knows a way.

I used the workaround by @zopyx to clear the lexica and reindex all ZCTextIndex indexes after migrating a Data.fs to Python 3 as described. I used the following code as a upgrade-step:

from plone import api
from Products.ZCTextIndex.interfaces import ILexicon
from Products.ZCTextIndex.interfaces import IZCTextIndex
from zope.globalrequest import getRequest

import logging
import six

log = logging.getLogger(__name__)

def post_py3_migration(self):
    if six.PY2:
        raise RuntimeError('This needs to be run after migrating to Python 3!')

    portal_catalog = api.portal.get_tool('portal_catalog')

    # Fix bytes in opkapiindex
    # See https://github.com/plone/Products.CMFPlone/issues/2905
    zctextindex = portal_catalog._catalog.indexes['SearchableText']
    opkapiindex = zctextindex.index
    values = opkapiindex._docwords.values()
    first_item = values[0]
    if isinstance(first_item, bytes):
        log.info('Rebuilding ZCTextIndexes. First item is a byte:')
        log.info(first_item)
        lexica = [i for i in portal_catalog.values() if ILexicon.providedBy(i)]
        for lexicon in lexica:
            lexicon.clear()

        indexes = [i for i in portal_catalog.Indexes.objectValues()
                   if IZCTextIndex.providedBy(i)]
        for index in indexes:
            try:
                index.clear()
            except AttributeError as e:
                log.info(e)
            log.info('rebuilding {}'.format(index.__name__))
            portal_catalog._catalog.reindexIndex(index.__name__, getRequest())
    else:
        log.info('Not rebuilding ZCTextIndexes. First item is not bytes:')
        log.info(first_item)
pbauer commented 4 years ago

This is also a part of collective.migrationhelpers: https://github.com/collective/collective.migrationhelpers/blob/master/src/collective/migrationhelpers/post_python3_fixes.py#L22