pik-copan / pyunicorn

Unified Complex Network and Recurrence Analysis Toolbox
http://pik-potsdam.de/~donges/pyunicorn/
Other
195 stars 87 forks source link

@cached_var may store wrong values if method has arguments #148

Closed mensch72 closed 7 months ago

mensch72 commented 3 years ago

A method decorated with @cached_var whose return value depends on some method arguments will wrongly return the cached results of the first call even for calls with different argument values!

lenas95 commented 2 years ago

Hi, could you elaborate which method exactly? And what your output was?

Very much appreciated

ntfrgl commented 2 years ago

This seems like a bug in core.network.cache_helper. All uses of @cached_const and @cached_var should be replaced by standard library implementations of memoization, taking care to use only functions available in the lowest Python version supported by pyunicorn. https://docs.python.org/3/library/functools.html

ntfrgl commented 1 year ago

As of 4cf6969c40de39f01f31ea141767ec67cc3d6d31, the minimal supported Python version is 3.8, and @functools.cached_property() from the standard library should replace the aforementioned helper functions.

fkuehlein commented 10 months ago

hi @ntfrgl,

I had started getting into this but it turned out to be a bit more work than I had first thought. With fe23f23 I basically substituted the caching in core.Network with functools memoization methods as you proposed, see commit message for details. Some additional notes on that:

  1. methods decorated with functools.cached_property() are now class properties and thus not callable, which slightly alters their handling and entailed a bit of refactoring. I could resolve that in all cases except for test_network.compare_measures() and test_network.compare_nsi(). I therefore pytest.mark.skip'd the tests that depend on those functions. Any ideas on how to resolve that? Or should we just use lru_cache throughout?

  2. functools.lru_cache(), on the other hand, requires hashable objects as parameters. This would currently only be a problem for Network.{nsi_}interregional_betweenness because they have list arguments. I just left the @cache decorator commented out on those, as for some reason had already been done in f4df6c6. Maybe that's useless and can be dropped, what do you think?

  3. I subsitituted the Network.clear_cache() method and its helper methods accordingly. In that process, I realized that other classes have their own clear_cache() methods, some inheriting from Network, but many also using their own caching system, which is basically just creating and deleting class attributes and setting self._{...}_cached-flags (as in RecurrencePlot.clear_cache()). This could also be substituted with functools, right? Anyway, that might exceed the scope of this issue.

  4. All tests are passing now, except for the skipped ones, but I'll make sure to add some more before merging anything.

Any other objections/additions/comments from your side?

ntfrgl commented 10 months ago

Thank you for making progress on this issue! Your approach mostly follows the recommendations in the official documentation, which suggest that @functools.cached_property() and @functools.lru_cache() could roughly replace @cached_const() and @cached_var(). https://docs.python.org/3/faq/programming.html#faq-cache-method-calls

  1. However, as you noted, a direct replacement conflicts with our existing API. Let me comment in more detail on our present requirements for a purely internal fix without any public API changes, and refine my suggestions accordingly.
    • Even when the affected method has no arguments, what we need is a cache for return values, while the user-facing methods should remain methods. Furthermore, the logging behaviour previously defined in cache_helper() via the msg argument should be retained.
    • The documentation above points out that a @functools.cached_property() cannot detect changes to instance attributes it depends on, which means that the correctness of cache invalidation relies on every possible mutation of instance attributes to also manually delete the cached property. This is a likely source of user errors.
    • More generally, let us note that the core problem in this issue, as well as in the one referenced in (2), is a consistent approach to cache invalidation, for a library in which recomputing network metrics after changing network attributes is a common use case.
    • Overall, I therefore think that the most ergonomic solution would be a caching decorator which unifies the roles of @cached_const() and @cached_var(), which uses @functools.lru_cache() internally with a configurable cache size, which takes care of logging as before, and which assumes that, as explained in the documentation above, the class owning the decorated method has well-defined __eq__() and __hash__() methods. All of our classes with custom caching mechanisms would require a little work to make this assumption valid, but that would then be the canonical solution to the cache invalidation problem. In particular, this will constitute an executable specification of all non-derived mutable properties in the user-facing classes, and will remove the need of manually calling clear_cache() in various places, while still allowing for it.
    • For inspiration, here is a commit I wrote for another package which ensures consistency of the __eq__() and __hash__() methods.
  2. Those method caches were commented out as a workaround in #124. The correct solution would be to use the same caching mechanism as for other methods, but after mapping the arguments bijectively into a hashable type, i.e., to use an uncached wrapper method which calls the implementation body as a private cached worker method with hashable arguments of fixed type. Specifically, since the expected arguments are of type Iterable[int], the easiest choice would be the hashable type Tuple[int,...].
  3. It would certainly reduce future maintenance burden if the caching system was consistent throughout the library, following (1). A straightforward approach here would be to define the clear_cache() method in a mix-in class that is inherited by all classes that use caching. Note, however, that your current implementation of Network.clear_cache() is incorrect, since the return type of dir(self) is List[str].
  4. Since this has been a recurring issue, before merging this PR, I would recommend extending the test suite with cache consistency checks, for as many as possible of the library methods which already have some tests and whose implementations are cached. This would take some effort, but would clearly increase the reliability of the library, and is related to my comment elsewhere. One possible approach would be along the following lines:
    • The vast majority of the existing tests is of the form <provide data, instantiate class, call method, compare output>. One could define an abstract test class, let's say in tests/conftest.py, with a generic test method which receives some specification for these steps, e.g., as objects and lambda functions, and executes them in order. In each of the existing test modules, this abstract test class would then be inherited and instantiated with the specific methods and arguments to recover the existing test suite behaviour.
    • In addition, there would now be new steps <modify instance attributes, un-modify instance attributes, modify method arguments>, such that the abstract test can check whether repeatedly calling a method yields the same result (when the instance attributes and method arguments are unchanged) or not (otherwise).
    • For inspiration, here is a little test class I wrote for another package which dynamically constructs a number of tests that execute the steps <import tutorial, call its main() method, check output>.
fkuehlein commented 10 months ago

Thanks a lot for all your ideas, hints and corrections!

For now, as it will require a good bit of work to implement this accordingly, I will postpone this issue and prioritize the smaller issues. Once those are solved, we can see if we still have time and resources to include a thorough fix of this issue in the release.

fkuehlein commented 8 months ago

Noticed an Issue related to memoization:

Several methods of climate.MutualInfoClimateNetwork and climate.HavlinClimateNetwork use an attribute named self._path_lengths_cached that is never assigned to. Due to the respective pylint warning no-member being globally disabled (see #196) and the respective methods not being covered by tests, this problem has apparently not been run into yet.