quinnj / JSON3.jl

Other
214 stars 47 forks source link

Trouble writing a UInt128 and reading it back again #173

Open JockLawrie opened 3 years ago

JockLawrie commented 3 years ago

Hi there,

This may be user error, but I'm having trouble doing the round trip with a UInt128 - MWE below. The problem appears in the return leg, which reads a Float64 instead of the UInt128. As workaround I can use parse instead of JSON3.read, but this assumes I know the type in advance. I could write it with the type info, then parse it back. Any other ideas?

Cheers, Jock

using JSON3

x  = rand(UInt128)
s  = JSON3.write(x)
x2 = JSON3.read(s)
@assert x == x2  # error

x3 = parse(UInt128, s)
@assert x == x3  # ok
quinnj commented 3 years ago

Yeah, this is a general problem with any language trying to provide a "closest approximation" to the JSON standard, where objects aren't explicitly typed. We support reading/writing UInt128, because you might have a struct like:

struct Foo
    x::UInt128
end

and then we know, when reading a Foo, that the field should be parsed directly as a UInt128. But when we're just doing a plain JSON3.read(x), we try to parse things as "standard JSON types", which means Nothing = null, true/false as Bool, strings as String, and numbers we try to detect whether they are Int64 or Float64. It complicates a lot of logic in the "untyped reading" case if we try to be even more clever than just trying Int64 or Float64 (most language implementations only support Float64).

Anyway, probably not a very satisfying answer, but if you do need to serialize/deserialize UInt128, it's best to probably use an explicit wrapper type, like:

struct UINT128
    x::UInt128
end

then it will be serialized like { "x": 12344556566 }, and can be deserialized by doing JSON3.read(json, UINT128).

quinnj commented 3 years ago

If you have a larger context where this is causing problems and can share, I'm happy to help brainstorm on different solutions that would help here.

JockLawrie commented 3 years ago

Thanks for responding so quickly, again.

The context is here, where I have a need to identify, train and validate thousands of statistical models. In de/serializing the Model type (defined in src/models.jl), which contains a UUID, I realized I could get more general de/serializing code, though not completely general because my code is a little messy (suggestions welcome!).

In particular, these jsonify/unjsonify functions seem to do the trick for a wide range of structs, including the Model struct defined in this repo. Note that these functions know nothing about the Model type. On the other hand some of my code is not type stable (see below), and uses runtime dispatch.

For the write leg it's just jsonify(x::MyType). The read leg only requires this constructor: MyType(json_string::String) = unjsonify(json_string, MyType). This looks a bit like the StructTypes interface, as does the code at src/serialize.jl, but StructTypes.jl confuses me when dealing with nested data structures.

Coming back to the UInt128 example, for now I've gone with this:

format_for_json(x::Real) = sizeof(x) > 8 ? string(x) : x  # Serialize. Not type stable but it works.
format_for_T(v::String,  T::Type{<:Real}) = parse(T, v)   # Deserialize

It's fine and I'm happy with it, but any suggestions to make this more elegant/general or to better leverage existing packages would be great.

And thanks for another great data package.

quinnj commented 3 years ago

So JSON3.jl can handle serializing/deserializing UUID automatically, so I dont' think you need to convert it to anything else? Just allow JSON3.write to write it, then call JSON3.read(json, UUID) to read it back in.