Open bnaul opened 5 years ago
You can register new translators easily enough. Notice at the bottom of that file we have a bunch of registration calls.
https://papermill.readthedocs.io/en/latest/extending-overview.html describes how one can register new engines and IO plugins. But I noticed we're missing https://github.com/nteract/papermill/blob/718f39e9012bc2cc14a0706801e2f2e934b0c1b6/papermill/engines.py#L32-L38 for translators. If you wanted to make a PR for that equivalent code in translators you could then do:
from setuptools import setup, find_packages
setup(
# all the normal setup.py arguments...
entry_points={"papermill.translators": ["python=translators:PandasPythonTranslator"]},
)
in your project to register the translations.
@MSeal I started to take a stab at this but I hadn't realized that we also inject all of the parameter values into the notebook metadata: https://github.com/nteract/papermill/blob/master/papermill/parameterize.py#L104 So presently no matter what you do in the translator, any non-trivial parameter will fail with:
TypeError: Object of type DataFrame is not JSON serializable
Why exactly is there a need to store this extra copy of the parameters if they appear in the injected-parameters
cell as well? I assume I'm just missing something about the execution flow that makes this necessary but unfortunately it seems like this inherently limits one to only simple JSON-compatible types.
It was intended to give programmatic access to what parameters were set. You can't read what the user input was from the cell itself if there was manipulations. To make this work, we'd likely want to have a placeholder object for non-json fields being added or catch the invalid json input and skip saving the parameters to the metadata.
Along the lines of #215, it seems like there are quite a few parameter types that would be desirable to pass as inputs (most notably
pd.DataFrame
s) that are simple enough to translate to/from JSON. Doing the transformation manually every time is a bit of a headache; is there any current method to register a custom translator that would handle this conversion automatically? Monkey-patchingtranslate
https://github.com/nteract/papermill/blob/718f39e9012bc2cc14a0706801e2f2e934b0c1b6/papermill/translators.py#L80-L99 seems like the best option at the moment, but being able to explicitly specify translator at runtime seems a lot safer.To be clear, this is not a proposal to automatically convert pandas objects (though I could certainly see an argument for that as well ๐ ), just to expose a method for users to add their own serialization methods.
dask
's method for allowing custom serializers is a nice, straightforward example of something similar: https://distributed.dask.org/en/latest/serialization.html#id3