koaning / memo

Decorators that logs stats.
https://koaning.github.io/memo/getting-started.html
MIT License
105 stars 9 forks source link
grid json-blobs simulation

Installation

pip install memo

Documentation

The documentation can be found here.

The quickstart guide is found here.

Usage

Here's an example of utility functions provided by our library.

import numpy as np
from memo import memlist, memfile, grid, time_taken

data = []

@memfile(filepath="results.jsonl")
@memlist(data=data)
@time_taken()
def birthday_experiment(class_size, n_sim):
    """Simulates the birthday paradox. Vectorized = Fast!"""
    sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
    sort_sims = np.sort(sims, axis=1)
    n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
    proba = np.mean(n_uniq != class_size)
    return {"est_proba": proba}

for settings in grid(class_size=[5, 10, 20, 30], n_sim=[1000, 1_000_000]):
    birthday_experiment(**settings)

The decorators memlist and memfile are making sure that the keyword arugments and dictionary output of the birthday_experiment are logged. The contents of the results.jsonl-file and the data variable looks like this;

{"class_size": 5, "n_sim": 1000, "est_proba": 0.024, "time_taken": 0.0004899501800537109}
{"class_size": 5, "n_sim": 1000000, "est_proba": 0.027178, "time_taken": 0.19407916069030762}
{"class_size": 10, "n_sim": 1000, "est_proba": 0.104, "time_taken": 0.000598907470703125}
{"class_size": 10, "n_sim": 1000000, "est_proba": 0.117062, "time_taken": 0.3751380443572998}
{"class_size": 20, "n_sim": 1000, "est_proba": 0.415, "time_taken": 0.0009679794311523438}
{"class_size": 20, "n_sim": 1000000, "est_proba": 0.411571, "time_taken": 0.7928380966186523}
{"class_size": 30, "n_sim": 1000, "est_proba": 0.703, "time_taken": 0.0018239021301269531}
{"class_size": 30, "n_sim": 1000000, "est_proba": 0.706033, "time_taken": 1.1375510692596436}

The nice thing about being able to log results to a file or to the web is that you can also more easily parallize your jobs! For example now you can use the Runner class to parrallelize the function call with joblib.

import numpy as np
from memo import memlist, memfile, grid, time_taken, Runner

data = []

@memfile(filepath="results.jsonl")
@memlist(data=data)
@time_taken()
def birthday_experiment(class_size, n_sim):
    """Simulates the birthday paradox. Vectorized = Fast!"""
    sims = np.random.randint(1, 365 + 1, (n_sim, class_size))
    sort_sims = np.sort(sims, axis=1)
    n_uniq = (sort_sims[:, 1:] != sort_sims[:, :-1]).sum(axis = 1) + 1
    proba = np.mean(n_uniq != class_size)
    return {"est_proba": proba}

# declare all the settings to loop over
settings = grid(class_size=range(20, 30), n_sim=[100, 10_000, 1_000_000])

# use a runner to run over all the settings
runner = Runner(backend="threading", n_jobs=-1)
runner.run(func=birthday_experiment, settings=settings, progbar=True)

Features

This library also offers decorators to pipe to other sources.

We also offer an option to parallelize function calls using joblib. This is facilitated with a Runner class which supports multiple backends.

Check the API docs here for more information on how these work.