Open hmaarrfk opened 8 months ago
Nice thanks for all the effort you put into this!
Mathematically the distinction between 0d array and scalar seems weird, but I guess from a type system perspective it makes sense. And in any case numpy's eccentricities are not in our control.
This strange behavior leads to the test suite using encode_scalars_inplace as an attempt to workaround the warning that "scalars cannot be reliably encoded", then turns around to using the strange "automatic downcast" behavior to recover the original structure.
The test suite isn't trying to work around the warning, it's testing the function encode_scalars_inplace
itself. It's a hacky function, but necessary because some scalars cannot be serialzied correctly (which ones depends on the Python version). This is described in #18. The function is a workaround of course, but part of the documented api:
So if you really want to encode numpy scalars, you’ll have to do the conversion beforehand. For that purpose you can use encode_scalars_inplace, which mutates a nested data structure (in place!) to replace any numpy scalars by their representation. If you serialize this result, it can subsequently be loaded without further adaptations.
The easiest way to see this behaviour is to add float64
to your new test test_scalar_roundtrip
.
But it seems good to support 0dimensional arrays, and with a few adjustments I think it'll work great. Thanks again!
I'm really not sure how to reconcile the fact that the default json
encoder will simply encode float64
objects as float.
The issue seems to stem from the fact that numpy's float64
scalar is a subclass of the python float
type:
import numpy as np
np.float64.mro()
[numpy.float64,
numpy.floating,
numpy.inexact,
numpy.number,
numpy.generic,
float,
object]
Eventually the json
encoder will simply check isinstance(o, float)
and since that returns "true", the encoder will just encode it.
https://github.com/python/cpython/blob/b0fb074d5983f07517cec76a37268f13c986d314/Lib/json/encoder.py#L426
I think that for backward compatibility concerns, numpy will also refuse to change the mro
.
We can
TricksEncoder
to simply use default
first, before the isinstance(o, float)
is called.We could extend the TricksEncoder to simply use default first, before the isinstance(o, float) is called.
I implemented this in this commit https://github.com/mverleg/pyjson_tricks/pull/99/commits/70bacca8c379fba2c770b17c3f706ff4045daf46
it copies alot of code from cpython at the very least we should split this off into a file to keep correct attribution to their license. i change 3 lines that had if isinstance(obj, float)
.
It seems that there are some edge cases with serializing scalars and numpy arrays with 0-dimensions
This has always confused me, so for precision, I quote numpy's documentation
So I added the following test to the file
tests/test_np.py
to see if things serialize correctly:After round tripping, we aren't preserving the "0-dimension" and it is being downcast to a numpy-scalar.
This strange behavior leads to the test suite using
encode_scalars_inplace
as an attempt to workaround the warning that "scalars cannot be reliably encoded", then turns around to using the strange "automatic downcast" behavior to recover the original structure. https://github.com/mverleg/pyjson_tricks/blob/master/tests/test_np.py#L170However, this would be incorrect if the user mixes "0-dimensional" numpy arrays.
Now, I originally came here to try to address a pretty specific usecase of ours: I am trying to serialize numpy datetime64 scalars. I tried to augment them to 0-dimensional arrays, however, this breaks two assumptions made in pyjson_tricks:
numpy.datetime64
objects are notnumpy.generics
used in the encodernumpy.datetime64
constructors cannot be obtained with thegetattr
used in the decoder since they have units (annoying I know, date is complicated)My proposal unfortunately requires breaking anybody using
replace_scalars_inplace
such as the test suiteOther references