Closed oliver-sanders closed 4 years ago
I'm having a go at this.
I've noticed TimeZone
inherits from Duration
: https://github.com/metomi/isodatetime/blob/00cdbcb17d9d0295dc8dd5aafecff2c71eb6e0cd/metomi/isodatetime/data.py#L617-L631
While this means the class is not many lines long, it's potentially nasty as it inherits a whole bunch of unnecessary properties (years
, months
, weeks
, days
) and methods (to_weeks()
, to_days()
, get_is_in_weeks()
, etc).
I think I ought to either reimplement the methods such as __eq__()
for TimeZone
or have a look at making a super/parent class for Duration
and TimeZone
that has the shared methods only
Hmmm, it does make sense for TimeZone
to be a Duration
, and these methods do behave as expected, they do no harm so I'm not really that worried.
A bit icky but we could potentially patch the methods away:
class TimeZone(Duration):
# ...
del TimeZone.to_weeks
Perhaps raise an issue to address this?
@oliver-sanders How do you profile cylc?
@oliver-sanders How do you profile cylc?
Will reply on Riot, not a quick answer.
Added proof of concept for the Duration class at #165 if you want to take a look. (Tests will be broken until all classes are taken care of)
It looks like @functools.lru_cache
isn't good for instance methods, and what we really want is @functools.cached_property
. But the doc for it says
This decorator requires that the
__dict__
attribute on each instance be a mutable mapping. This means it will not work with some types, such as metaclasses (since the__dict__
attributes on type instances are read-only proxies for the class namespace), and those that specify__slots__
without including__dict__
as one of the defined slots (as such classes don’t provide a__dict__
attribute at all).
We only have __slots__
for all of the data classes. I'm not very familiar with __slots__
and __dict__
- what are the implications of this?
It looks like @functools.lru_cache isn't good for instance methods
I couldn't find a reason why on that page, what's the problem?
This example works for me, we cache methods in cylc-flow using lru_cache
:
from functools import lru_cache
class Foo:
def __init__(self, a, b):
self.a = a
self.b = b
@lru_cache()
def method(self, x):
return self.a == self.b == x
y = Foo(1, 2)
y.method(1)
y.method(1)
y.method(1)
y.method(2)
print(y.method.cache_info())
$ python testy.py
CacheInfo(hits=2, misses=2, maxsize=128, currsize=2
Sorry, wrong link. From https://stackoverflow.com/questions/33672412/python-functools-lru-cache-with-class-methods-release-object it looks like @lru_cache
leaks memory when used for class methods
Dammit! Why the hell didn't they use weak references 😡, one quick look at the functools code confirmed that ignorance is indeed bliss.
So it's not a really memory leak, it stores up to the number of objects you told it to. The LRU in lru_cache
stands for "least recently used" which means that older caches will get overwritten by newer ones. Old "stale" objects will disappear from the cache eventually. It's just a matter of setting a sensible cache size.
However the better solution would involve weak references, I really don't know why lru_cache
doesn't do this, speed? Knocked this example together pretty quickly, it can housekeep objects provided they are not buried in data structures.
def cache(func):
cache = {}
hits = 0
misses = 0
def _lru_cache(*args):
nonlocal cache, func, hits, misses, _clean, _weak_tuple
cache_args = _weak_tuple(args)
if cache_args not in cache:
misses += 1
cache[cache_args] = func(*args)
else:
hits += 1
_clean()
return cache[cache_args]
def _clean():
nonlocal cache, _is_dead
for key in list(cache):
if _is_dead(key):
cache.pop(key)
def _weak_tuple(items):
ret = []
for item in items:
try:
ret.append(weakref.ref(item))
except TypeError:
ret.append(item)
return tuple(ret)
def _is_dead(item):
for value in item:
if isinstance(value, weakref.ref) and not value():
return True
return False
def _cache_info():
nonlocal cache, hits, misses
return f'hits {hits} misses {misses} items {len(cache)}'
_lru_cache.cache_info = _cache_info
return _lru_cache
Side Note: The above code uses weak references for the keys but not the values which results in a similar problem. You could use weak references to both but then you wouldn't really be caching 😃.
There may be alternative lru_cache
implementations out there somewhere. I wouldn't worry too much about it, especially as, from the Cylc perspective cycles move in one direction temporally so stale objects will disappear naturally in time.
Good spotting @MetRonnie!
Perhaps the biggest architectural limitation on isodatetime is the fact that the data classes (e.g. TimePoint, Duration, TimeRecurrence) are mutable objects.
They should all be immutable and hashable, this would widen the scope for caching and efficiency improvements.
The biggest barrier to achieving this are some of the methods on
TimePoint
which mutate theTimePoint
itself (e.g.to_week_date
), the easy solution is to return a new object rather than mutating the existing one.Once done we can
lru_cache
TimeRecurrence.get_is_valid
which is currently the biggest offender in Cylc.