Closed edublancas closed 2 years ago
We'd probably need to get the most flexible one here: cloudpickle/dill etc.
yeah, so part of fixing this involves finding out which is the most flexible
Yes I did read in an article that dill has more options than cloudpickle.
This mentions the formats supported vs pickle vs dill : https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
Was also thinking if we should have both as options.?
Also, it seems that the PicklingError is being thrown here : ploomber/executors/serial.py:161: DAGBuildError
, so I need to catch this error here and raise the above mentioned suggestions to the user?
I think if it's not too much overhead we should support both as options, some users might not be familiar with the other. On the Error, definitely, we need to capture it and have a user friendly message that instructs on next steps/suggest how to fix.
a few comments on how to auto-detect this.
we currently don't have a way of detecting it. soogeon analyzes the code to generate the pipeline but it never runs it, so we don't know if the user will run into this issue until they execute ploomber build
for the first time. So my recommendation is as to add it as a CLI arg:
soorgeon refactor notebook.ipynb --serializer dill/cloudpickle/etc
Note: these should all be optional dependencies. so if a user selects dill but it doesn't have it installed it should throw an error telling them to install it first
Auto-detecting this problem is a much more difficult problem so I'd say let's only add it as a CLI arg for now
Yes this is the approach I have followed (have added a serialize option in CLI ) The only way it can be detected is when executing ploomber build, so was thinking of providing suggestions to the user once it throws a PickleError during the ploomber build.
I think let's skip this verification part for now. because a PickleError could be many things, perhaps the user did not generate the pipeline with soorgeon, so it'll be confusing. thoughts @idomic ?
Ok. So we dont need to provide the link to guide or suggest user to use the serializer, snce there's no way of detecting that in soorgeon ? We just need to mention the serializer option in the docs?
yeah, just mention it in the docs so they know they can use the --serializer
option
Yeah I kind of agree, if someone came across this feature it'd be through the docs and it'll be clear from it that the dependency is needed. If this error happens just with core ploomber, it can even be a malformed code so it can create user confusion.
closing as we merged via #53
Some objects aren't pickable (e.g, jinja templates) so we should capture this errors at runtime and suggest fixes:
sorgeon notebook.ipynb --serializer cloudpickle