eode commented 2 years ago

Pretty much in the title. This adds JSON as an option for serialization/deserialization.

Possibly unrelated, but noteworthy: If you're interested, I have a few minor text output changes I can either merge with this PR or do in a separate PR, like:

"openinning" -> "opening" in some docstrings,
a missing 'f' for an f-string (resulting in a message like Available values are {list(cls._pickler_registry)} instead of Available values are ['foo', 'bar']
use repr() on quite a few strings that would otherwise be unquoted in error output, such as:
- Original: ValueError: Unregistered extension .lz9. Registered extensions are ['bz', 'bz2', 'gz', ...]
- Modified: ValueError: Unregistered extension '.lz9'. Registered extensions are ['bz', 'bz2', 'gz', ...]

codecov[bot] commented 2 years ago

Codecov Report

Merging #40 (8c3493b) into master (3a1ecd0) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #40   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           22        23    +1     
  Lines          435       455   +20     
=========================================
+ Hits           435       455   +20

Impacted Files	Coverage Δ
compress_pickle/picklers/__init__.py	`100.00% <100.00%> (ø)`
compress_pickle/picklers/json.py	`100.00% <100.00%> (ø)`

eode commented 2 years ago

I added a test that includes all of the basic JSON data types.

lucianopaz commented 2 years ago

This looks really great @eode!! Thanks so much for opening a PR! My only request is that you also add the json pickler here.

Your code seems to pass the CI but a newer version of mypy broke things. I'll fix that so that you can rebase and get your PR merged.

lucianopaz commented 2 years ago

About the typo fixes, they are more than welcome.

lucianopaz commented 2 years ago

@eode, I fixed the mypy issue. Feel free to rebase this PR and I'll merge it in

eode commented 2 years ago

Sorry for the awkward merge rather than a rebase. I pulled from my repo github page, which was irreversible. However, we can rebase upon closing this PR.

This looks really great @eode!! Thanks so much for opening a PR! My only request is that you also add the json pickler here.

Well, @lucianopaz -- thanks for having such a great project! I like your code.

Regarding the changes in Update fixtures to support JSON:

JSON doesn't support binary (bytes) data, but does support str data. That being the case, I've changed the return type of tests.fixtures.random_message() to str instead of bytes. This allows the same tests to be performed as were previously done, just with a different (JSON-compatible) data type.

Unnecessary detail:

I've updated the message string so it still correctly describes the situation
Simply treating the bytes as utf-8 and decoding them wouldn't always work, as the random bytes at the end would occasionally cause invalid unicode.
- So, I've changed the randomness method from os.urandom() to random.choices() of printable chars from the string module.
- Also, (not that it matters much) I have increased the number of random characters a bit to somewhat compensate for the smaller range of randomness in printable characters vs bytes.

Known Issues:

When dumping to an x.json or 'x' file path using JSON, the default behavior is to switch it to x.pkl. While this can be overridden using set_default_extension=False, and that's "good enough," I'd like to make this PR play nicely with inference as well. I am working on that currently.

eode commented 2 years ago

To address the Known Issues area above requires a different notion of extensions than is currently coded for the compress_pickle package. That is, it's not a trivial change. Since this PR is otherwise acceptable (and is good progress), my vote is to go ahead an merge this PR if it looks good to you, and I'll do a separate PR for inferences that interpret extensions for both compression and for pickle/serialization formats.

The result of that PR would be:

For both compression/decompression and serialization/deserialization, the priority of configuration will remain as User Config, Inference, Defaults
BasePicklerIO subclasses will be given their own extension handling/recognition
The result will be able to correctly read and write filenames like foo.json.bz2, foo.pkl.bz2, foo.bz2 (assuming it used the default pickler), foo.json (assuming it's not compressed).
Currently, the default behavior is to write an incorrect extension for JSON content ('pkl') unless set_default_extension is specified as False, and inference fails on reading a .json file.

However, as long as a user specifies that they want JSON in the call to dump/load, it works fine.

lucianopaz / compress_pickle

Add JSON support #40

Codecov Report