plone / Products.CMFPlone

The core of the Plone content management system
https://plone.org
GNU General Public License v2.0
240 stars 183 forks source link

Intermittent error when using "not" when searching with index Subject #3895

Open wesleybl opened 5 months ago

wesleybl commented 5 months ago

BUG/PROBLEM REPORT (OR OTHER COMMON ISSUE)

When we do a search in the Subject index, with "not", the result is sometimes wrong. Sometimes it returns an empty list, when it should return content.

What I did:

  1. Create a document with the Tag: "Bulletin"
  2. Create a Python script in ZMI with the content:
return str(context.portal_catalog(Subject={"not": ["Bulletin"]}))
  1. Run the script multiple times.

What I expect to happen:

The search must return all content that does not contain the "Bulletin" Tag

What actually happened:

Sometimes an empty list is returned.

What version of Plone/ Addons I am using:

Plone 6.0.9

wesleybl commented 5 months ago

In fact, the problem occurs even if the Subject is made up of just one word. I updated the description to accommodate this.

wesleybl commented 5 months ago

I debugged this error and came to the following conclusions:

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/PluginIndexes/unindex.py#L507

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/PluginIndexes/unindex.py#L508-L510

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/PluginIndexes/unindex.py#L586-L620

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/PluginIndexes/unindex.py#L681-L683

But why does it sometimes work? Let's go:

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/ZCatalog/Catalog.py#L620-L627

https://github.com/plone/Products.CMFPlone/blob/22da3b9e7e33b047fe0823aba66d793f1ac4685e/Products/CMFPlone/CatalogTool.py#L403

https://github.com/zopefoundation/Products.ZCatalog/blob/5df390192393faffe182d858808037f3bd532187/src/Products/ZCatalog/Catalog.py#L620

So this problem can occur in all indexes where not all content has a value, not just with Subject.

How to fix this problem?

I could think about forcing the index allowedRolesAndUsers to always be first. But it is a Plone-specific index, which Products.ZCatalog "does not know".

Could we force indexes with "not" to always be last?

Or do something like: If the search result is empty and the index contains a "not", search all records first. But how and where to do this?

@mauritsvanrees @davisagli @mamico @jensens I'm mentioning you because you recently messed with Products.ZCatalog. Opinions?

wesleybl commented 5 months ago

I could think about forcing the index allowedRolesAndUsers to always be first. But it is a Plone-specific index, which Products.ZCatalog "does not know".

In fact, allowedRolesAndUsers exists in Zope too:

https://github.com/zopefoundation/Products.CMFCore/blob/c73cdf4f0fdcca4b9bb95813ade7a374282dd801/src/Products/CMFCore/CatalogTool.py#L208

@dataflake @icemac any thoughts here?

dataflake commented 5 months ago

The index is not in Zope, it's in Products.CMFCore where the only "consumer" is Plone. I am not a ZCatalog expert, sorry.

mamico commented 5 months ago

@wesleybl I think changing the order of the indexes should mitigate the problem, but in the end it won't be the real solution.

However, if you want to experiment, you can do so by monkey-patching the method Products.ZCatalog.Catalog.Catalog_sorted_search_indexes, you can find inspiration here https://github.com/RedTurtle/redturtle.volto/blob/master/src/redturtle/volto/catalogplan.py

In the meantime, I would try opening an issue or a PR (starting with a test that breaks) on Products.ZCatalog.

I also see a similar problem, not the same one, here https://github.com/zopefoundation/Products.ZCatalog/issues/35 and some work done, but probably not fully completed, here https://github.com/zopefoundation/Products.ZCatalog/pull/74

\cc @andbag @d-maurer

wesleybl commented 5 months ago

I think changing the order of the indexes should mitigate the problem, but in the end it won't be the real solution.

@mamico I think this would solve the problem in a simpler way. At least it would solve the problem for those using Plone or Products.CMFCore, which I believe are the biggest users of Zope.

Any other solution would be more complex and would have to allow returning all objects in the catalog before applying the filter with not.