when-of-python / blog

OLD When of Python Blog
https://whenof.python.nz/blog
2 stars 0 forks source link

blog/pickle-in-a-pickle #1

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Python Feature Opinions | The When of Python Blog

The When-of-Python is at heart a community project. We need people to share their opinions on different Python language features. If you have opinions on any language features we'd love to hear from you.

https://when-of-python.github.io/blog/python-feature_opinions.html

nathanjmcdougall commented 1 year ago

Language feature: pickle Common Python / Situational Python / Deprecated Python: Situational Python Opinion: If the serialized file is only ever going to be used internally on a temporary basis, i.e. not for file interchange, and not for long-term storage, then carefully using pickle files may be appropriate. Otherwise avoid if at all possible, and consider JSON instead. If you don't need to serialize data, and you're just looking for a general purpose file format for data storage which has fast read and write times, consider Apache Parquet, or JSON: pickle files are specifically for serialization, not general data storage.

There are some significant drawbacks to pickle files, many of which are not very well known to inexperienced Python users:

Some Python types likewise cannot be directly serialized into JSON. In such cases, a function could be written to manually extract the defining information in the type into JSON, and then another function written to reproduce the type instance from this JSON. This may be impractical in some cases, in which case pickle could be considered.

Interesting References: pickle — Python object serialization — Python 3.10.6 documentation (especially the warning regarding the insecurity of the module, and the comparison with JSON). Don't Pickle Your Data (benfrederickson.com) Pickle’s nine flaws | Ned Batchelder

rbtcollins commented 1 year ago

Pickle should be considered deprecated. There is no strong justification for its use, ever. And plenty of reasons not to use it. Safe object serialisation systems like pb, jelly and spread. And typed language neutral interchange formats like protobufs, capnproto and so many more. And of course generic its just text systems like yaml json toml etc.

grantps commented 1 year ago

Hi Robert - all things being equal I feel happier if we can properly deprecate things rather than leave them in a highly-constrained situational category. One goal is to shrink Python after all. Just out of curiosity, do you think pickle will actually get deprecated properly at some point (perhaps over multiple years)?

I'd also like to flesh out the alternatives a bit more - in particular "pb, jelly and spread". I couldn't find any good links (apologies for my poor google foo).

ncoghlan commented 1 year ago

As much as I would like to support the notion of pickle being in the "Deprecated Python" category, it doesn't currently have a compelling alternative in the distributed computing case. Even https://github.com/cloudpipe/cloudpickle is still based on regular pickle under the hood.

Maybe "Situational Python" needs a "Hyperspecialised Python" subcategory: if you don't already know why you need the feature, consider it part of "Deprecated Python".

ncoghlan commented 1 year ago

"Problematic Python" would also work as a category name: we'd deprecate if we could, but there are valid use cases without compelling alternative solutions.

rbtcollins commented 1 year ago

@ncoghlan What features does pickle have that are relevant in even once a year usage? I know many uses of pickle to save data in the middle of notebooks for instance, but that seems trivially doable via other unstructured serialization techniques (as k8s did in its early API evolution).

grantps commented 3 months ago

Insecurity and Python pickles seems really interesting