thadeusb / flask-cache

Cache extension for Flask
http://packages.python.org/Flask-Cache/
Other
700 stars 185 forks source link

Raise a warning/error when memoizing instance/class methods #118

Open vicentealencar opened 9 years ago

vicentealencar commented 9 years ago

The lib, in its current state, makes it very easy for developers to memoize instance/class methods without realizing that the self/cls parameter will be used to calculate the cache key. This issue is very tricky to spot, since developers don't usually go check whether or not memoization is working properly. However, conceptually, I do agree that it makes sense to use self/cls to calculate the cache key, since the instance's internal state could potentially be used by the given method.

I propose raising an exception with an informative error message every time memoize is used to decorate instance/class methods whose instances don't implement __cacherepr__ (per the current suggestion written in the comments). For the purposes of backwards compatibility, this exception could be swallowed by setting a configuration property to false (e.g.: CACHE_CHECK_INSTANCE_MEMOIZATION)

vicentealencar commented 9 years ago

Coming to think of it, I think using repr for objects that do not provide a custom implementation of __repr__, and therefore use object's implementation, is dangerous and it is currently doing more harm than good.

object's implementation will just print the class name along with the memory reference to that object (e.g.: '<object object at 0x1101140e0>'). This string doesn't provide flask-cache with information regarding the object's internal state. Therefore, mutating one of the object's properties won't change the result of calling repr on that object, which might cause the memoized version of the method to return incorrect results (at least until the cache expires)

E.g.:

class X(object):
  def __init__(self):
    self.x = 0

  @cache.memoize(10)
  def cached_method(self, y):
    return self.x + y

test = X()
test.cached_method(1)
# Returns 1

test.x = 2
test.cached_method(1)
# should return 3, but returns 1

A handful of bugs like that were being introduced to our project's code base. The issue above can be easily fixed by providing the class with a custom implementation of __repr__ that correctly outputs the object's internal state as part of its return value:

class X(object):
  def __repr__(self):
    return "X(%s)" % self.x

  def __init__(self):
    self.x = 0

  @cache.memoize(10)
  def cached_method(self, y):
    return self.x + y

test = X()
test.cached_method(1)
# Returns 1

test.x = 2
test.cached_method(1)
# Returns 3

In order to prevent issues like the one described on the first example from happening again, we've overridden flask-cache's memoize method to raise a ValueError whenever a memoized function parameter uses object's implementation of __repr__:

class SafeCache(Cache):

  def memoize(self, *args, **kwargs):
    unsafe_memoized = super(JusBrasilCache, self).memoize(*args, **kwargs)

    def safe_memoized(func):
      decorated_function = unsafe_memoized(func)
      unsafe_make_key = decorated_function.make_cache_key

      def safe_make_key(f, *args, **kwargs):
        is_cacheable = lambda obj: hasattr(obj, '__repr__') and\
            obj.__repr__() != object.__repr__(obj)

        forbidden_params = filter(
          lambda x: not is_cacheable(x),
          list(args) + kwargs.values())

        if forbidden_params:
          raise ValueError("memoize cannot be used with functions with non-primitive parameters that do not provide a specific implementation of __repr__: %s" % forbidden_params)
        else:
          return unsafe_make_key(f, *args, **kwargs)

      decorated_function.make_cache_key = safe_make_key

      return decorated_function

    return safe_memoized

I believe the above change should be incorporated by flask-cache, in order to prevent others from introducing bugs to their codebases. I can send flask-cache a pull request if we determine the solution above is appropriate.