zopefoundation / Products.ZCatalog

Zope's indexing and search solution.
Other
5 stars 22 forks source link

Optimize plain "not" queries #124

Closed jensens closed 3 years ago

jensens commented 3 years ago

In Plone 6 we do queries returning a result except one item (the current context). A not query on the UID catalog is used to do this. In larger projects with several 10 or 100 thousands object cataloged, this slowed down queries to take several seconds. Without the not everything was fast.

After some profiling we found the time is wasted in this loop https://github.com/zopefoundation/Products.ZCatalog/blob/7f614a6394e236beae6ba69053bfb8c04d3f073a/src/Products/PluginIndexes/unindex.py#L575-L664 primary while factoring IISets here https://github.com/zopefoundation/Products.ZCatalog/blob/master/src/Products/PluginIndexes/unindex.py#L608

Whats happens? The not query is written to return all values except the ones matching. Thus, for a single UID, this creates an result containing all index keys, except one. This is very expensive.

What I have done: If the code detects a simple not query (without any operators) excluding only one or more keys, I shortcut the whole index_query and return the previous result w/o the current catalog id.

Does it help? Yes. We bench-marked our query on a customer database with ~367000 catalogued objects. It cut down the whole requests time from ~1200ms to ~80ms (with 40ms of it been not query time in both).

cc @tisto @mauritsvanrees @cekk

jensens commented 3 years ago

Coverage tests are red on master too: https://coveralls.io/github/zopefoundation/Products.ZCatalog

jensens commented 3 years ago

see also https://github.com/plone/plone.restapi/issues/1252

dataflake commented 3 years ago

I trust your changes are OK, but don't have any ZCatalog Fu.

jensens commented 3 years ago

this branch includes now #125

ale-rt commented 3 years ago

We see that it is only possible to squash and merge the commits: image Is that intended? CC @icemac

icemac commented 2 years ago

@ale-rt Yes this is intended to keep the commit history straight.