Closed jrapin closed 2 weeks ago
Sorry, duplicate of #8194
No problem. Thanks for double checking. We're working on pickling for 2.6.0!
@sydney-runkle TypeError: cannot pickle 'sqlite3.Connection' object
in #9698 is a different error from the ones I've experienced but my team encountered it lately as well and it probably has the same roots indeed.
Solving this problem seems to have been deprioritized since it's no more in a milestone, do you expect it will be handled at all? I feel this is a bit concerning given it's a blocker for many pipelines (in particular in research where we need flexibility and iterating quickly in a notebook)
Hi @jrapin,
Thanks for following up. It's not in our priority list for our upcoming v2.8 release, but I'll add this to our v2.9 milestone and see if we can make some headway here in July.
Hi @jrapin,
Thanks for following up. It's not in our priority list for our upcoming v2.8 release, but I'll add this to our v2.9 milestone and see if we can make some headway here in July.
@sydney-runkle thank you for the update :)
Hi folks, I've been debugging this issue on behalf of Prefect, and here's what I've discovered. The repro is very simple:
import cloudpickle
from pydantic import BaseModel
class Model(BaseModel):
pass
print(cloudpickle.dumps(Model()))
In a CPython shell, this works fine. In an IPython shell, you'll end up with any of a number of errors about unpicklable objects (for me it's usually sqlite3.Connection
).
So it all comes down to __pydantic_parent_namespace__
, which according to the source code is "used for automatic rebuilding of models". This is a class var on all pydantic model classes, and it captures everything that's in scope at the point a model is defined. In an IPython shell, this includes a whole lot of hidden magic that ends up referencing things like a SQLite3 DB (for the IPython history) and sys.stdin
(for some prompt_toolkit
stuff). If we nuke those __pydantic_parent_namespace__
dictionaries while we're pickling, everything is cool:
import io
from typing import Any
import cloudpickle
from pydantic import BaseModel
class Model(BaseModel):
pass
class Referrer(BaseModel):
model: Model
things: list[Model]
def safe_cloudpickle(obj: Any) -> bytes:
model_namespaces = {}
with io.BytesIO() as f:
pickler = cloudpickle.CloudPickler(f)
for ModelClass in BaseModel.__subclasses__():
model_namespaces[ModelClass] = ModelClass.__pydantic_parent_namespace__
ModelClass.__pydantic_parent_namespace__ = None
try:
pickler.dump(obj)
return f.getvalue()
finally:
for ModelClass, namespace in model_namespaces.items():
ModelClass.__pydantic_parent_namespace__ = namespace
print(safe_cloudpickle(Model()))
print(safe_cloudpickle(Referrer(model=Model(), things=[Model()])))
One of the ways I debugged into this was using gc.get_referents
to navigate down the graph of objects referencing other objects, and I think this dict on each class seems to be keeping a lot of objects alive (and possibly causing excessive memory use?). Is there ever a time when we know that we're "done" with __pydantic_parent_namespace__
? Could it ever safely be dropped?
I'm wondering if we can avoid attaching this dict to every model and instead keep it in a module-level cache over in pydantic
instead so that we unblock cloudpickling?
I was also looking into ways to customize the __reduce__
function or add a custom reducer for pydantic BaseModel
s, but the problem is that it's trivial to clear this __pydantic_parent_namespace__
during reduction, but there's not a good way to restore it.
Hi pydantic team,
any chance this could be fix soon -- this is a blocker for our team :)
Thanks!
Hey @kingjr, thanks for following up! I'll take another look at this for v2.10.
pls fix this. i am getting this
TypeError("cannot pickle 'pydantic_core._pydantic_core.ValidatorIterator' object")
@TheRaLabs,
Could you please attach your reproducible example?
@chrisguidry,
Amazing work debugging. Picking up where you left off :). Thanks for the clear issue breakdown!
So, we made some significant changes to how we store __pydantic_parent_namespace__
in v2.9. Specifically: https://github.com/pydantic/pydantic/pull/10113.
I'm able to repro the issue with the following code on v2.8.2, but not on v2.9.1:
import cloudpickle
from pydantic import BaseModel
class Model(BaseModel):
pass
print(cloudpickle.dumps(Model()))
I'm tempted to close this generic issue (becuase this should be fixed in most cases), but am happy to address specific use cases (like the ValidationIterator
stuff - would love a MRE) in separate issues. Please, if you're still experiencing issues here against our latest version, open a new issue and ping me with your question!
We're going to continue to further attempt to simplify namespace management internally, see https://github.com/pydantic/pydantic/issues/10074
Wonderful news, thank you!
Initial Checks
Description
Following #6763, instances defined within a function can be pickled as expected while they could not beforehand. However, it is still not possible to cloudpickle directly from within an Ipython console or from a Jupyter notebook, with 2 different errors.
From a Ipython console the code below will trigger:
and from a Jupyter notebook:
When removing the
pydantic.BaseModel
as a base class then cloudpickle would work as expected. I'm running the latest versions of each package as shown below but also tested pydantic v2.5.0 and cloudpickle v2.2.1 without any difference. Any idea what is causing this? :(Example Code
Python, Pydantic & OS Version