Open zopyx opened 5 years ago
Workaround to reindexing all fulltext indexes:
# post migration fixes for ZCatalog text indexes
import transaction
from zope.component.hooks import setSite
from Testing.makerequest import makerequest
indexes = ('SearchableText', 'Title', 'Description')
site = app.plone_portal
catalog = site.portal_catalog
setSite(site)
makerequest(site)
for lexicon in ('htmltext_lexicon', 'plaintext_lexicon', 'plone_lexicon'):
print(f'Clearing {lexicon}')
catalog[lexicon].clear()
for index in indexes:
print(f'Clearing {index}')
catalog._catalog.getIndex(index).clear()
print(f'Reindexing {index}')
catalog._catalog.reindexIndex(index, site.REQUEST)
print('Committing...')
transaction.commit()
To me this looks more like a issue with the import rather than the catalog. Unless you can provide a way to reproduce that issue we'll not be able to help you.
The only interaction with Plone is through plone.restapi by creating new objects. This happened twice with a 100.000 object import..this could be any client. There should not be any inconsistencies..and the mass import basically contains only standard core content objects.
Does that mean that plone.restapi might not properly reindex objects when importing?
Does that mean that plone.restapi might not properly reindex objects when importing?
Perhaps - perhaps not. It could also be some kind of ZCatalog issues causing a corruption.
The fact is that I can reproduce it with our large migration import. Events are always broken in the same way. Error is triggerd by this num_more_occurences
code. No visible corruption detected with standard plone search, live search - only Events are affects.
Perhaps related or not: Events require a notify()
call
https://ploneappevent.readthedocs.io/en/latest/development.html?highlight=notify
but I don't see that could be related.
I now have the same issue where SearchableText contains bytes instead of text.
It only happens after migrating the Database to Python 3 using zodbupdate
.
>>> from plone import api
>>> portal_catalog = api.portal.get_tool('portal_catalog')
>>> zctextindex = portal_catalog._catalog.indexes['SearchableText']
>>> opkapiindex = zctextindex.index
>>> values = opkapiindex._docwords.values()
>>> [i for i in values]
In Python 2 that is a list of bytes:
['\xe7\x03Ex\xe7\x03Ey\xe7\x03Eh\xe7\x03Ei\xe7\x03Ej\xe7\x03Ek\xe7\x03El\xe7\x03Em\xe7\x03En\xe7\x03Eo\xe7\x03Ep\xe7\x03Eq\xe7\x03Er\xe7\x03Es\xe7\x03Et\xe7\x03Eu\xe7\x03Ej\xe7\x03Ev\xe7\x03Ew\xe7\x03Ez\xe7\x03Em\xe7\x03E{\xe7\x03E|\xe7\x03E}\xe7\x03E~\xe7\x03E\x7f\xe7\x03F\x00\xe7\x03F\x01\xe7\x03F\x02\xe7\x03F\x03\xe7\x03F\x04\xe7\x03F\x05\xe7\x03F\x06\xe7\x03F\x07\xe7\x03F\x08\xe7\x03F\t\xe7\x03F\n\xe7\x03Ej\xe7\x03Ew\xe7\x03F\x0b\xe7\x03F\x0c\xe7\x03Em\xe7\x03F\r\xe7\x03Eo\xe7\x03Ej\xe7\x03F\x0e\xe7\x03F\x0f\xe7\x03F\x10\xe7\x03F\x06\xe7\x03F\x07\xe7\x03F\x08\xe7\x03F\x11\xe7\x03F\x12\xe7\x03Em\xe7\x03F\x13\xe7\x03Em\xe7\x03F\x14\xe7\x03F\x15\xe7\x03F\x16\xe7\x03F\x17\xe7\x03Ej\xe7\x03F\x08\xe7\x03F\x18\xe7\x03F\x19\xe7\x03F\x1a\xe7\x03Em\xe7\x03F\x0b\xe7\x03F\x1b\xe7\x03F\x1c\xe7\x03Em\xe7\x03F\x1d\xe7\x03F\x1e\xe7\x03F\x1f\xe7\x03F \xe7\x03Em\xe7\x03F!\xe7\x03F"\xe7\x03F#\xe7\x03F$\xe7\x03F%\xe7\x03F\x10\xe7\x03F&\xe7\x03F\'\xe7\x03F(\xe7\x03Em\xe7\x03F\x10\xe7\x03F)\xe7\x03F*\xe7\x03F+\xe7\x03F,\xe7\x03En\xe7\x03Ej\xe7\x03F-\xe7\x03F.\xe7\x03F/\xe7\x03F0\xe7\x03F1\xe7\x03F2\xe7\x03F3\xe7\x03F4\xe7\x03F5\xe7\x03F6\xe7\x03F7\xe7\x03F8\xe7\x03F9\xe7\x03F:\xe7\x03F;\xe7\x03F<\xe7\x03F=\xe7\x03F8\xe7\x03F>\xe7\x03Em\xe7\x03F?\xe7\x03F@\xe7\x03FA\xe7\x03FB\xe7\x03FC\xe7\x03F\x08\xe7\x03FD\xe7\x03F \xe7\x03Eo\xe7\x03FE\xe7\x03FF\xe7\x03FG\xe7\x03F6\xe7\x03FH\xe7\x03F3\xe7\x03Eo\xe7\x03F>\xe7\x03E\x7f\xe7\x03FI\xe7\x03F\x19\xe7\x03Em\xe7\x03F\x14\xe7\x03F\x15\xe7\x03Ej\xe7\x03F\x18\xe7\x03FJ\xe7\x03FK\xe7\x03FL\xe7\x03F\x00\xe7\x03FM\xe7\x03FN\xe7\x03Em\xe7\x03F@\xe7\x03F\x17\xe7\x03FO\xe7\x03Ej\xe7\x03FP\xe7\x03FQ\xe7\x03FR\xe7\x03FN\xe7\x03Em\xe7\x03F6\xe7\x03FS\xe7\x03FQ\xe7\x03FR\xe7\x03FT\xe7\x03FU\xe7\x03Em\xe7\x03F\x15\xe7\x03FV\xe7\x03F\x1c\xe7\x03Em\xe7\x03Ej\xe7\x03FW\xe7\x03FL\xe7\x03FX\xe7\x03F\x1c\xe7\x03Em\xe7\x03FY\xe7\x03FZ\xe7\x03F[\xe7\x03F\\\xe7\x03Ej\xe7\x03F\x0e',
'\xe7\x03F^\xe7\x03F]\xe7\x03F]',
'\xe7\x03F_\xe7\x03F]\xe7\x03F]',
'\xe7\x03Fa\xe7\x03F`\xe7\x03F`',
'\xe7\x03F_\xe7\x03F`\xe7\x03F`',
'\xe7\x03Ff\xe7\x03Fb\xe7\x03Fc\xe7\x03FB\xe7\x03Fd\xe7\x03Fe\xe7\x03F\x05\xe7\x03Fb',
'\xe7\x03Fg\xe7\x03Fg\xe7\x03Fh']
After migrating it still is a list of bytes:
[b'\xe7\x03Ex\xe7\x03Ey\xe7\x03Eh\xe7\x03Ei\xe7\x03Ej\xe7\x03Ek\xe7\x03El\xe7\x03Em\xe7\x03En\xe7\x03Eo\xe7\x03Ep\xe7\x03Eq\xe7\x03Er\xe7\x03Es\xe7\x03Et\xe7\x03Eu\xe7\x03Ej\xe7\x03Ev\xe7\x03Ew\xe7\x03Ez\xe7\x03Em\xe7\x03E{\xe7\x03E|\xe7\x03E}\xe7\x03E~\xe7\x03E\x7f\xe7\x03F\x00\xe7\x03F\x01\xe7\x03F\x02\xe7\x03F\x03\xe7\x03F\x04\xe7\x03F\x05\xe7\x03F\x06\xe7\x03F\x07\xe7\x03F\x08\xe7\x03F\t\xe7\x03F\n\xe7\x03Ej\xe7\x03Ew\xe7\x03F\x0b\xe7\x03F\x0c\xe7\x03Em\xe7\x03F\r\xe7\x03Eo\xe7\x03Ej\xe7\x03F\x0e\xe7\x03F\x0f\xe7\x03F\x10\xe7\x03F\x06\xe7\x03F\x07\xe7\x03F\x08\xe7\x03F\x11\xe7\x03F\x12\xe7\x03Em\xe7\x03F\x13\xe7\x03Em\xe7\x03F\x14\xe7\x03F\x15\xe7\x03F\x16\xe7\x03F\x17\xe7\x03Ej\xe7\x03F\x08\xe7\x03F\x18\xe7\x03F\x19\xe7\x03F\x1a\xe7\x03Em\xe7\x03F\x0b\xe7\x03F\x1b\xe7\x03F\x1c\xe7\x03Em\xe7\x03F\x1d\xe7\x03F\x1e\xe7\x03F\x1f\xe7\x03F \xe7\x03Em\xe7\x03F!\xe7\x03F"\xe7\x03F#\xe7\x03F$\xe7\x03F%\xe7\x03F\x10\xe7\x03F&\xe7\x03F\'\xe7\x03F(\xe7\x03Em\xe7\x03F\x10\xe7\x03F)\xe7\x03F*\xe7\x03F+\xe7\x03F,\xe7\x03En\xe7\x03Ej\xe7\x03F-\xe7\x03F.\xe7\x03F/\xe7\x03F0\xe7\x03F1\xe7\x03F2\xe7\x03F3\xe7\x03F4\xe7\x03F5\xe7\x03F6\xe7\x03F7\xe7\x03F8\xe7\x03F9\xe7\x03F:\xe7\x03F;\xe7\x03F<\xe7\x03F=\xe7\x03F8\xe7\x03F>\xe7\x03Em\xe7\x03F?\xe7\x03F@\xe7\x03FA\xe7\x03FB\xe7\x03FC\xe7\x03F\x08\xe7\x03FD\xe7\x03F \xe7\x03Eo\xe7\x03FE\xe7\x03FF\xe7\x03FG\xe7\x03F6\xe7\x03FH\xe7\x03F3\xe7\x03Eo\xe7\x03F>\xe7\x03E\x7f\xe7\x03FI\xe7\x03F\x19\xe7\x03Em\xe7\x03F\x14\xe7\x03F\x15\xe7\x03Ej\xe7\x03F\x18\xe7\x03FJ\xe7\x03FK\xe7\x03FL\xe7\x03F\x00\xe7\x03FM\xe7\x03FN\xe7\x03Em\xe7\x03F@\xe7\x03F\x17\xe7\x03FO\xe7\x03Ej\xe7\x03FP\xe7\x03FQ\xe7\x03FR\xe7\x03FN\xe7\x03Em\xe7\x03F6\xe7\x03FS\xe7\x03FQ\xe7\x03FR\xe7\x03FT\xe7\x03FU\xe7\x03Em\xe7\x03F\x15\xe7\x03FV\xe7\x03F\x1c\xe7\x03Em\xe7\x03Ej\xe7\x03FW\xe7\x03FL\xe7\x03FX\xe7\x03F\x1c\xe7\x03Em\xe7\x03FY\xe7\x03FZ\xe7\x03F[\xe7\x03F\\\xe7\x03Ej\xe7\x03F\x0e',
b'\xe7\x03F^\xe7\x03F]\xe7\x03F]',
b'\xe7\x03F_\xe7\x03F]\xe7\x03F]',
b'\xe7\x03Fa\xe7\x03F`\xe7\x03F`',
b'\xe7\x03F_\xe7\x03F`\xe7\x03F`',
b'\xe7\x03Ff\xe7\x03Fb\xe7\x03Fc\xe7\x03FB\xe7\x03Fd\xe7\x03Fe\xe7\x03F\x05\xe7\x03Fb',
b'\xe7\x03Fg\xe7\x03Fg\xe7\x03Fh']
Instead it should be a list of text.
I'm currently not sure how to fix this issue during running zodbupdate since the items in question are inside a IOBTree
. Maybe @davisagli knows a way.
I used the workaround by @zopyx to clear the lexica and reindex all ZCTextIndex indexes after migrating a Data.fs to Python 3 as described. I used the following code as a upgrade-step:
from plone import api
from Products.ZCTextIndex.interfaces import ILexicon
from Products.ZCTextIndex.interfaces import IZCTextIndex
from zope.globalrequest import getRequest
import logging
import six
log = logging.getLogger(__name__)
def post_py3_migration(self):
if six.PY2:
raise RuntimeError('This needs to be run after migrating to Python 3!')
portal_catalog = api.portal.get_tool('portal_catalog')
# Fix bytes in opkapiindex
# See https://github.com/plone/Products.CMFPlone/issues/2905
zctextindex = portal_catalog._catalog.indexes['SearchableText']
opkapiindex = zctextindex.index
values = opkapiindex._docwords.values()
first_item = values[0]
if isinstance(first_item, bytes):
log.info('Rebuilding ZCTextIndexes. First item is a byte:')
log.info(first_item)
lexica = [i for i in portal_catalog.values() if ILexicon.providedBy(i)]
for lexicon in lexica:
lexicon.clear()
indexes = [i for i in portal_catalog.Indexes.objectValues()
if IZCTextIndex.providedBy(i)]
for index in indexes:
try:
index.clear()
except AttributeError as e:
log.info(e)
log.info('rebuilding {}'.format(index.__name__))
portal_catalog._catalog.reindexIndex(index.__name__, getRequest())
else:
log.info('Not rebuilding ZCTextIndexes. First item is not bytes:')
log.info(first_item)
This is also a part of collective.migrationhelpers: https://github.com/collective/collective.migrationhelpers/blob/master/src/collective/migrationhelpers/post_python3_fixes.py#L22
We tested 52RC5-pending with a full migration of the UGent site (complete import over plone.restapi).
All events show up with an KeyError obviously due to some inconsistencies in portal_catalog or the Lexicon only.
...under further investigation