gusutabopb commented 4 years ago

Aioinflux used to provide a built-in caching local functionality using Redis. However, due to low perceived usage, vendor lock-in (Redis) and extra complexity added to Aioinflux, I have decided to remove it.

Hopefully no one else besides my past self use this functionality. In case someone else did, or in case someone else didn't but may be interested in caching InfluxDB query results, I will add a simple implementation of a simple caching layer using pickle. If this affects you please let me know by commenting below.

codecov-commenter commented 4 years ago

Codecov Report

Merging #33 into master will increase coverage by 0.20%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master      #33      +/-   ##
==========================================
+ Coverage   96.58%   96.78%   +0.20%     
==========================================
  Files           9        9              
  Lines         556      529      -27     
==========================================
- Hits          537      512      -25     
+ Misses         19       17       -2

Impacted Files	Coverage Δ
aioinflux/client.py	`94.71% <100.00%> (+0.41%)`	:arrow_up:
aioinflux/compat.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 18a5402...a0d0152. Read the comment docs.

gusutabopb commented 4 years ago

Simple cache

Here's a simple example of a cacheing layer for InfluxDB/aioinflux (which will also be available on the v0.10.0 docs). It works by caching dataframes as compressed pickle files on disk. It can be easily modified to use your preferred caching strategy, such as using different serialization, compression, cache key generation, etc.

See function docstrings, code comments below for more details.

Uncached code:

    from aioinflux import InfluxDBClient

    c = InfluxDBClient(output='dataframe')
    q = """
        SELECT * FROM executions
        WHERE product_code='BTC_JPY'
        AND time >= '2020-05-22'
        AND time < '2020-05-23'
    """
    # If this query is repeated, it will keep hitting InfluxDB,
    # increasing the load on instance and using extra bandwidth
    df = await c.query(q)

Caching code:

import re
import hashlib
import pathlib
import pandas as pd

def _hash_query(q: str) -> str:
    """Normalizes and hashes the query to generate a caching key"""
    q = re.sub("\s+", " ", q).strip().lower().encode()
    return hashlib.sha1(q).hexdigest()

async def fetch(influxdb: InfluxDBClient, q: str) -> Tuple[pd.DataFrame, bool]:
    """Tries to see if query is cached, else fetches data from the database.

    Returns a tuple containing the query results and a boolean indicating 
    whether or not the data came from local cache or directly from InfluxDB
    """
    p = pathlib.Path(_hash_query(q))
    if p.exists():
        return pd.read_pickle(p, compression="xz"), True
    df = await influxdb.query(q)
    df.to_pickle(str(p), compression="xz")
    return df, False

Caching code usage:

df, cached = await fetch(c, q)
print(cached)  # False - cache miss

df, cached = await fetch(c, q)
print(cached)  # True - cache hit

gusutabopb / aioinflux

Remove caching functionality #33

Codecov Report

Simple cache