metomi / isodatetime

:date: :watch: Python ISO 8601 date time parser and data model/manipulation utilities
GNU Lesser General Public License v3.0
37 stars 20 forks source link

make data classes hashable #162

Closed oliver-sanders closed 4 years ago

oliver-sanders commented 4 years ago

Perhaps the biggest architectural limitation on isodatetime is the fact that the data classes (e.g. TimePoint, Duration, TimeRecurrence) are mutable objects.

They should all be immutable and hashable, this would widen the scope for caching and efficiency improvements.

The biggest barrier to achieving this are some of the methods on TimePoint which mutate the TimePoint itself (e.g. to_week_date), the easy solution is to return a new object rather than mutating the existing one.

Once done we can lru_cache TimeRecurrence.get_is_valid which is currently the biggest offender in Cylc.

MetRonnie commented 4 years ago

I'm having a go at this.

I've noticed TimeZone inherits from Duration: https://github.com/metomi/isodatetime/blob/00cdbcb17d9d0295dc8dd5aafecff2c71eb6e0cd/metomi/isodatetime/data.py#L617-L631

While this means the class is not many lines long, it's potentially nasty as it inherits a whole bunch of unnecessary properties (years, months, weeks, days) and methods (to_weeks(), to_days(), get_is_in_weeks(), etc).

I think I ought to either reimplement the methods such as __eq__() for TimeZone or have a look at making a super/parent class for Duration and TimeZone that has the shared methods only

oliver-sanders commented 4 years ago

Hmmm, it does make sense for TimeZone to be a Duration, and these methods do behave as expected, they do no harm so I'm not really that worried.

A bit icky but we could potentially patch the methods away:

class TimeZone(Duration):
    # ...

del TimeZone.to_weeks

Perhaps raise an issue to address this?

MetRonnie commented 4 years ago

@oliver-sanders How do you profile cylc?

oliver-sanders commented 4 years ago

@oliver-sanders How do you profile cylc?

Will reply on Riot, not a quick answer.

MetRonnie commented 4 years ago

Added proof of concept for the Duration class at #165 if you want to take a look. (Tests will be broken until all classes are taken care of)

MetRonnie commented 4 years ago

It looks like @functools.lru_cache isn't good for instance methods, and what we really want is @functools.cached_property. But the doc for it says

This decorator requires that the __dict__ attribute on each instance be a mutable mapping. This means it will not work with some types, such as metaclasses (since the __dict__ attributes on type instances are read-only proxies for the class namespace), and those that specify __slots__ without including __dict__ as one of the defined slots (as such classes don’t provide a __dict__ attribute at all).

We only have __slots__ for all of the data classes. I'm not very familiar with __slots__ and __dict__ - what are the implications of this?

oliver-sanders commented 4 years ago

It looks like @functools.lru_cache isn't good for instance methods

I couldn't find a reason why on that page, what's the problem?

This example works for me, we cache methods in cylc-flow using lru_cache:

from functools import lru_cache                                                      

class Foo:

    def __init__(self, a, b): 
        self.a = a 
        self.b = b 

    @lru_cache()
    def method(self, x): 
        return self.a == self.b == x

y = Foo(1, 2)
y.method(1)
y.method(1)
y.method(1)
y.method(2)
print(y.method.cache_info())
$ python testy.py
CacheInfo(hits=2, misses=2, maxsize=128, currsize=2
MetRonnie commented 4 years ago

Sorry, wrong link. From https://stackoverflow.com/questions/33672412/python-functools-lru-cache-with-class-methods-release-object it looks like @lru_cache leaks memory when used for class methods

oliver-sanders commented 4 years ago

Dammit! Why the hell didn't they use weak references 😡, one quick look at the functools code confirmed that ignorance is indeed bliss.

So it's not a really memory leak, it stores up to the number of objects you told it to. The LRU in lru_cache stands for "least recently used" which means that older caches will get overwritten by newer ones. Old "stale" objects will disappear from the cache eventually. It's just a matter of setting a sensible cache size.

However the better solution would involve weak references, I really don't know why lru_cache doesn't do this, speed? Knocked this example together pretty quickly, it can housekeep objects provided they are not buried in data structures.

  def cache(func):
      cache = {}
      hits = 0
      misses = 0

      def _lru_cache(*args):
          nonlocal cache, func, hits, misses, _clean, _weak_tuple
          cache_args = _weak_tuple(args)
          if cache_args not in cache:
              misses += 1
              cache[cache_args] = func(*args)
          else:
              hits += 1
          _clean()
          return cache[cache_args]

      def _clean():
          nonlocal cache, _is_dead
          for key in list(cache):
              if _is_dead(key):
                  cache.pop(key)

      def _weak_tuple(items):
          ret = []
          for item in items:
              try:
                  ret.append(weakref.ref(item))
              except TypeError:
                  ret.append(item)
          return tuple(ret)

      def _is_dead(item):
          for value in item:
              if isinstance(value, weakref.ref) and not value():
                  return True
          return False

      def _cache_info():
          nonlocal cache, hits, misses
          return f'hits {hits} misses {misses} items {len(cache)}'
      _lru_cache.cache_info = _cache_info

      return _lru_cache

Side Note: The above code uses weak references for the keys but not the values which results in a similar problem. You could use weak references to both but then you wouldn't really be caching 😃.

There may be alternative lru_cache implementations out there somewhere. I wouldn't worry too much about it, especially as, from the Cylc perspective cycles move in one direction temporally so stale objects will disappear naturally in time.

Good spotting @MetRonnie!