nteract / testbook

🧪 📗 Unit test your Jupyter Notebooks the right way
https://testbook.readthedocs.io
BSD 3-Clause "New" or "Revised" License
416 stars 37 forks source link

Use pickle instead of JSON for serialization #138

Open coderforlife opened 2 years ago

coderforlife commented 2 years ago

Why can't pickle be used instead of JSON? It supports a much wider number of variable types and the advantages of JSON don't really make sense here:

As an example, I made some extra methods that I monkey patch on that have worked so var in a much wider range of possibilities than the current value() method provides:

import ast
import pickle

def get_value(self, expression):
    """
    Gets a value computed with an expression in the notebook. The value must be pickle-able.

    Raises TestbookRuntimeError is there is a problem running the code.
    """
    output = self.inject(f"import pickle\npickle.dumps({expression})", pop=True).outputs[0]
    # Instead of ast.literal_eval could use: value[2:-1].encode('latin1').decode('unicode-escape').encode('latin1'))
    return pickle.loads(ast.literal_eval(output.data['text/plain']))

def set_variable(self, varname, value):
    """
    Sets a variable's value in the notebook.
    The varname must be a string containing a valid Python variable name.
    The value can be any value that can be pickled.
    """
    self.inject(f"import pickle\n{varname} = pickle.loads({pickle.dumps(value)})", pop=True)

You can then even do tb.get_value('_') which will get the output of the last executed cell. I have been able to use this for numpy arrays, Pandas DataFrames and Series, and other types as well that the JSON serialization balks at.

I wouldn't add the get_value() method to your class, instead, I would replace all usages of JSON with pickling. I just do this to not mess with any of the methods already there.

Some changes may need to be made to ref() since it seems to only return references to things that are not JSON serializable. It seems like it should always return a reference and not a value (the TestBookReference object would need to support more magic methods for some people though). One problem is that functions can sometimes be pickled. Sometimes unpickling them might fail even if they were pickled.

tbenthompson commented 1 year ago

You could go a step further here and use cloudpickle which would allow serializing a much wider range of objects including, for example, classes that are defined inside the notebook. cloudpickle is a common solution for interprocess communication of arbitrary objects in Python.