nteract / scrapbook

A library for recording and reading data in notebooks.
https://nteract-scrapbook.readthedocs.io
BSD 3-Clause "New" or "Revised" License
281 stars 26 forks source link

Encoder abstract class and `pickle`/`dill` encoders #91

Open oakaigh opened 11 months ago

oakaigh commented 11 months ago

@willingc @MSeal It would be helpful to provide a base class (interface) for encoders so that people don't get confused when trying to implement new encoders. Also I would like to have a "pickler" as one of the builtin encoders since oftentimes I need to embed open matplotlib figures for rework later; the use cases aren't limited to matplotlib figures either - its the picklable objects: users should have the freedom to save and restore any variable of their choice in interactive notebooks.

Implementation details follow

import scrapbook.encoders import scrapbook.scraps

import abc

encoder class interface

class BaseEncoder(abc.ABC): def name(self): ...

def encodable(self, data):
    ...

def encode(self, scrap: sb.scraps.Scrap, **kwargs):
    ...

def decode(self, scrap: sb.scraps.Scrap, **kwargs):
    ...

pickle encoder

import base64 import pickle

import functools

TODO ref https://stackoverflow.com/a/38755760

def pipeline(*funcs): return lambda x: functools.reduce(lambda f, g: g(f), list(funcs), x)

class PickleEncoder(BaseEncoder): ENCODER_NAME = 'pickle'

def name(self):
    return self.ENCODER_NAME

def encodable(self, data):
    # TODO
    return True

def encode(self, scrap: sb.scraps.Scrap, **kwargs):
    _impl = pipeline(
        functools.partial(pickle.dumps, **kwargs),
        # NOTE .decode() makes sure its a UTF-8 string instead of bytes
        lambda x: base64.b64encode(x).decode()
    )
    return scrap._replace(
        data=_impl(scrap.data)
    )

def decode(self, scrap: sb.scraps.Scrap, **kwargs):
    _impl = pipeline(
        base64.b64decode,
        functools.partial(pickle.loads, **kwargs)
    )
    return scrap._replace(data=_impl(scrap.data))

TODO dill encoder

NOTE dill has a function .pickles to check if an object is encodable. so encodable does not have to return True regardless of data like PickleEncoder does

def register(): sb.encoders.registry.register(PickleEncoder())


Usage examples

- `notebook.ipynb`
```python
import scrapbook as sb
import scrapbook_ext as sb_ext
# register the encoder(s); currently required as the above implementation is a separate module
sb_ext.register()

import matplotlib.pyplot as plt

import numpy as np
import io
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)

fig, ax = plt.subplots(figsize=(5, 3.5))
ax.plot(t, s)

ax.set(xlabel='time (s)', ylabel='voltage (mV)')
ax.grid()

# glue this figure to the notebook
sb.glue("figure:test", fig)
# sb.glue("figure:test", fig, encoder='pickle')

import scrapbook as sb import scrapbook_ext as sb_ext

register the encoder(s); currently required as the above implementation is a separate module

sb_ext.register()

nb = sb.read_notebook('notebook.ipynb')

display the figure

nb.scraps['figure:test'].data



See also: Example project
https://github.com/oakaigh/scrapbook-ext