ploomber / soorgeon

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊
https://ploomber.io
Apache License 2.0
78 stars 20 forks source link

working with unserializable objects #10

Open edublancas opened 2 years ago

edublancas commented 2 years ago

e.g., the notebook uses a db connection:

conn = open_db_connection()

We should not serialize this object, but rather embed this line in any of the tasks that use it. How do we know if serialize it or not? Easiest way is to have the user add a "preparation" cell, and then add it if any of the tasks use the variables as inputs

idomic commented 2 years ago

In short, what happens today is that soorgeon throws an error on global variables. We want to:

@edublancas clarification on the 2nd point, we don't serialize/reinstantiate the objects to maintain the state? (i.e in the db_conn we don't really care what happened before). Ideally the users won't need to take this extra step - it should happen for them.

edublancas commented 2 years ago

I changed the title since it wasn't accurate. this isn't about all types of global variables. but about variables defined in the notebook that do not support serialization.

A typical example is connections to a database or logging objects. The problem is that we won't know if the notebook contains non-serializable objects because soorgeon only performs static analysis, it never runs the code.

idomic commented 2 years ago

Keeping this for a later stage, until we'll have a run function that can run the user's code