Closed d-maurer closed 4 years ago
The failing Python 2.7 tests are caused by:
...
File "/home/travis/build/zopefoundation/Products.ZCatalog/eggs/soupsieve-2.0-py2.7.egg/soupsieve/css_parser.py", line 3, in <module>
from functools import lru_cache
ImportError: cannot import name lru_cache
Apparently, soupsieve==2.0
is no longer compatible with Python 2.7. The problem has nothing to do with this PR.
@hannosch Having seen that you are assigned to #94 I have added you as reviewer.
This PR should partially fix #94: in the cases where the stale cache results come from the situation that id(catalog)
is no longer unique. I believe (but do not have proof yet) that stale cache results can in addition occur in cases where an index modification remains undetected.
... I believe (but do not have proof yet) that stale cache results can in addition occur in cases where an index modification remains undetected.
I have checked and could not detect missed index modifications for the indexes which come with ZCatalog
. Nevertheless, I believe the (_counter
based) architecture fragile: any index deriving from UnIndex
must ensure a call of _increment_counter
in its index_object
and unindex_object
-- a requirement which is nowhere stated explicitly and could well be overlooked. It would be safer to have the counter maintained by in the enclosing Catalog
(which is rarely used as base class and if it is, its catalog_object
and uncatalog_object
are likely called anyway).
@d-maurer Hanno was not really active in the last year or so. So I'd not count on him reviewing this PR.
This may sound a little pedantic. A test for the solution of the problem would be nice.
Andreas Gabriel wrote at 2020-4-17 13:45 -0700:
This may sound a little pedantic. A test for the solution of the problem would be nice.
All (existing) tests related to request caching still work. As a consequence, one can expect that the alternative implementation does not break things.
The use of a WeakKeyDictionary
garantees that cache entries
disappear when the catalog gets garbage collected (and thus the formerly
used catalog id potentially loses its uniqueness).
A specific test would essentially verify that WeakKeyDictionary
works as promised -- and we can assume that this is already verified.
It is not easy at all to construct a test which reliably reproduces the original errorneous behaviour: I would need to have two catalogs which get the same "id" (i.e. memory address) (at different times) and produce different search results. The memory address of an object is an implementation detail; thus, one can expect significant fiddling around to come up with a test where two non-trivial (catalog) objects get the same memory address and the test may fail on a different platform or with a different Python version. Maybe, you have an idea how to reliable fulfill the requirement?
Are there any news on this PR? What needs to be done to get it merged?
Michael Howitz wrote at 2020-7-29 23:24 -0700:
Are there any news on this PR? What needs to be done to get it merged?
No news - still waiting for an approval.
@andbag was the only person who commented and asked for a test. But, it is almost impossible to reproduce the original problem via a test (the failure requires that the same (memory) address is used for different catalogs within the same request; in addition, some "counts" must be identical, used to invalidate the cache when the catalog is modified).
All existing tests have passed. Thus, as far as the tests go,
the new request caching is not worse than the old one.
Being based on a WeakDict
, one can reason (without test)
that the PR resolved the original problem which was caused
by using id(catalog)
as essential part of the cache key.
As I do not feel competent to review this PR @andbag could you please do a review?
It seems to be time for a new release of the package maybe this PR could be merged before.
Andreas Gabriel wrote at 2020-8-20 23:55 -0700:
@andbag approved this pull request.
Yes, the changes meet the requirements. @d-maurer I was not aware of WeakDictionary before. I have learned something new :). Thank you.
I thank you for your review.
This addresses #94 and "https://community.plone.org/t/potential-memory-corruption-during-migration-plone-4-2-5-2/11655/11".
Scripts developed for Plone are usually run with
bin/client1 run
which creates an artificial request. Such a script (performing a (mass) migration Plone 4.2 -> Plone 5.2 for many Plone portals) revealed that the cache keys used for the catalog's request caching are not sufficiently safe -- at least in a setup with an artificial request working on several catalogs in separate transactions. This PR does not store the request caches directly in the request but instead uses an intermediateWeakKeyDictionary
keyed directly with the catalog objects (rather than with a key derived from the catalog's id and address). This ensures that if the catalog object is released (no longer in memory; making its address ambiguous), the associated cached data is discarded.