Open eode opened 2 years ago
Also -- for any given extension, would you rather the first-registered PicklerIO
class be the handler, or the last-registered one?
Last-registered means that if the user registers a class for some extension, it becomes the default for that extension. First-registered means that whatever our choice of default is, it stays the default unless the user deregisters all picklers and then registers their own.
Thanks @eode. The feature you are proposing seems very nice.
I'd like to add support for pickle extensions as a separate concept. So, for example,
"foo.pkl.bz2"
is gracefully handled, and "baz.json" is gracefully handled. I would be changing the defaultpickle_method
arguments from"pickle"
to"infer"
-- but the behavior in the majority of circumstances would be the same.
What should we do for "foo.bz2"
? At the moment, it assumes that pickle is used, but with what you propose, this would fail to infer the serialization method, and raise an error, right?
But overall, this issue is a little more touchy than just adding basic JSON support, because it could cause API breakage. I suppose the core of what I'm asking is:
What level of Public-facing API change are you willing to accept?
* I imagine changing the default `pickler_method` to 'infer' is fine.
I wouldn't change it yet. I would set it to some value that then issues a DeprecationWarning
. That way, we can let users know that the argument's default will change in a future release.
* I imagine adding new functions is fine
Yes, this is fine
* I imagine the API of `dump`, `dumps`, `load`, and `loads` should remain substantially the same.
I agree
However, what about repurposing or renaming other functions? Example:
* `get_registered_extensions` should be repurposed or renamed, because its function is no longer fully clear if there are also pickler extensions * for example, `get_registered_extensions()` could: * leave it as-is, but that would make the name somewhat deceptive * return _all_ registered extensions, for both picklers and compressers, but that would require a different data format (because picklers will have more overlap in extension handling, so something like {"pkl": ["pickle", "dill", "cloudpickle"], "bz": ["bz2"]} * be renamed to `get_registered_compresser_extensions` * ..etc
What I would like to do with it is: If a function's output changes, rename it to explicitly break compatibility (rather than implicitly by having different output) If a function's name is no longer reflective of its purpose, or implies more than it does, rename it. All functions that do effectively identical things for compressers and picklers would be named or renamed according to a common theme, like
get_default_compresser_name_mapping
andget_default_pickler_name_mapping
. It might be easier to let the public API for registries be handled through module reference, likecompressers.get_default_extension_map()
andpicklers.get_default_extension_map()
.
I like the idea of keeping the names at the module level for compressers and picklers (by the way, since we've added json support, eventually we probably should also rename picklers to serializers?). I agree that we should also change the name of get_registered_extensions
at the package level. We should probably change this line to
get_registered_extensions as get_registered_compression_extensions
And then add a function that keeps the same name at the package level, saying that importing get_registered_extensions
from compress_pickle
is deprecated, and it should be changed to from compress_pickle.compressers import get_registered_extensions
Anyways, those are my thoughts currently. In the mean time, I'm trying to have as little impact on the API as possible, to err on the side of caution.
In all, I really like your suggestions.
To also answer your other comment, I prefer that the first registered PicklerIO
gets precedence. We could maybe add a function called set_default_extension_handler
or something like that, to help users change the pickler without having to unregister everything.
Thanks @eode. The feature you are proposing seems very nice.
Good to hear.
What should we do for
"foo.bz2"
? At the moment, it assumes that pickle is used, but with what you propose, this would fail to infer the serialization method, and raise an error, right?
- I imagine changing the default
pickler_method
to 'infer' is fine.I wouldn't change it yet. I would set it to some value that then issues a
DeprecationWarning
. That way, we can let users know that the argument's default will change in a future release.
So, both of these might be addressed by making "infer"
the default argument, and "pickler"
the default serializer. Thus, when no serializer is specified/found by extension, pickler would still be used as is currently done. The only difference would be that if someone were using "mydata.json.bz2"
as a pickle data file, then that would instead be interpreted as JSON (and fail to load).
Although on the surface that might seem like an implausible scenario, the number of times I've seen someone's data files that ended up being called "blahblah.xls.csv" or something like that is in fact, nonzero, and greater than one.
IMO, the chance is slight enough and the fix easy enough that it's worth the small potential risk of breakage. However, it's your baby, and I definitely defer to you on this. As it is, I'll assume deprecation warning unless the above changes your mind.
I like the idea of keeping the names at the module level for compressers and picklers (by the way, since we've added json support, eventually we probably should also rename picklers to serializers?)
I agree -- and perhaps leave functions in picklers
that match the original functions/methods, but would create deprecation warnings. That way, if someone imports the picklers
module, it'll still work.
That said, the project is called "compress_pickle", and JSON is kindof just a bonus at this point. :-)
get_registered_extensions as get_registered_compression_extensions
And then add a function that keeps the same name at the package level, saying that importing
get_registered_extensions
fromcompress_pickle
is deprecated, and it should be changed tofrom compress_pickle.compressers import get_registered_extensions
Definitely.
Anyways, those are my thoughts currently. In the mean time, I'm trying to have as little impact on the API as possible, to err on the side of caution.
In all, I really like your suggestions.
Thanks! Good to hear.
To also answer your other comment, I prefer that the first registered
PicklerIO
gets precedence. We could maybe add a function calledset_default_extension_handler
or something like that, to help users change the pickler without having to unregister everything.
Good idea.
Alright, I'll be poking around at this off and on, and I'll get a PR up and running at some point so you can track changes.
Ok. So, looking into this pretty deeply and doing some coding, I'm not sure I'll be able to pull this off without some pretty deep intrusion into the existing codebase. Just a forewarning that it's not likely to be a small PR, and it's somewhat hard to break up into chunks.
Are you still interested in a PR for this even if it has a pretty big impact on the codebase?
Yes, definitely @eode
I'd like to add support for pickle extensions as a separate concept. So, for example,
"foo.pkl.bz2"
is gracefully handled, and "baz.json" is gracefully handled. I would be changing the defaultpickle_method
arguments from"pickle"
to"infer"
-- but the behavior in the majority of circumstances would be the same.But overall, this issue is a little more touchy than just adding basic JSON support, because it could cause API breakage. I suppose the core of what I'm asking is:
What level of Public-facing API change are you willing to accept?
pickler_method
to 'infer' is fine.dump
,dumps
,load
, andloads
should remain substantially the same.However, what about repurposing or renaming other functions? Example:
get_registered_extensions
should be repurposed or renamed, because its function is no longer fully clear if there are also pickler extensionsget_registered_extensions()
could:get_registered_compresser_extensions
What I would like to do with it is:
get_default_compresser_name_mapping
andget_default_pickler_name_mapping
.compressers.get_default_extension_map()
andpicklers.get_default_extension_map()
.Anyways, those are my thoughts currently. In the mean time, I'm trying to have as little impact on the API as possible, to err on the side of caution.