rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
206 stars 55 forks source link

Use lru_cache from functools #14

Closed gedeck closed 5 years ago

gedeck commented 5 years ago

Instead of implementing a caching mechanism, it might be useful to consider the @lru_cache decorator from the standard library. I did not check the impact on performance and that it works.

https://docs.python.org/3.3/library/functools.html#functools.lru_cache

Note that you can get information about the cache performance using

replace_wildcard_with_H.cache_info()
adalke commented 5 years ago

mmpdb supports Python 2 and Python 3. Python 2 doesn't have an lru_cache. There is currently no timeline for dropping Python 2 support for mmpdb. RDKit will drop Python 2 support by the end of this year.

gedeck commented 5 years ago

There is a backport of functools for Python 2.7: https://pypi.org/project/functools32/

Considering that the cache gives only a 2% performance improvement, it's probably not worth the effort to add an additional dependency to support 2 and 3.

adalke commented 5 years ago

Well, that's 2% improvement of the overall performance. With 50% cache rate, caching doubles the effective performance of this function.

In principle the LRU should give better re-use than the single-generation cache I wrote. But I think I chose a large enough cache size that it didn't matter.

There are other one-off cache implementations in mmpdb, like the _weld_cache in mmpdblib/analysis_algorithms.py . A close look shows that it's actually two caches merged together.

FWIW, both my cache and lru_cache handle over 3 million calls per second - fast enough that that detail doesn't matter.

I don't think this is an important/useful enough detail to work on now, and I don't like including/requiring another package as part of mmpdb.

gedeck commented 5 years ago

I agree. Let's close it then