Closed lexspoon closed 2 weeks ago
Do not use custom serializer for sum and product types if you don't need an exceptional performance.
Much safer and efficient way is to have a data model that is close to JSON representation as much as possible for easier automated derivation of codecs and Chimney/Ducktape based transformation to your target data model.
Ah, okay. This is a confusing thing to read given that the phrase "custom codec" appears 8 times on the home page, with statements over and over that custom codecs are feature that should attract you to Jsoniter-scala. Perhaps it is worth updating the home page to say you shouldn't implement custom codecs, that it's just an implementation detail that it's even possible?
I find in practice that you can automatically convert 95% of your case classes to JSON, but that occasionally you want to do something custom. The MetricData and TypeRef examples in this PR are real examples that I ran into while attempting to use Jsoniter-scala at work. I cannot adjust the case classes to match the automatic codecs for those two types as far as I know. For TypeRef, I would have to use a String and not have the marker class. For MetricData, I don't think it's possible at all. These are exceptional cases, but in a large code base, the exceptions do happen once in a while.
A practical toolkit for encoding case classes to JSON will generally need some kind of escape hatch for custom codecs. I'm surprised that's not interesting for Jsoniter-scala, given that the original Jsoniter has it.
I can help with finding the most safe and efficient solutions for your challenges.
Please open an issue for each of them with expected JSON samples and existing data structures.
The PR has two examples in the test cases, so please take a look. I'm thinking of MetricData and TypeRef.
In general, I think it will be hard to avoid wanting to ever write a custom decoder. Moreover, I'm not sure why it would be unwelcome to make this process easier. The framework would still have all its other advantages, plus now one more.
The custom codec for TypeRef
can be written manually without extra wrapping decoder:
implicit val codecOfTypeRef: JsonValueCodec[TypeRef] = new JsonValueCodec[TypeRef] {
override def decodeValue(in: JsonReader, default: TypeRef): TypeRef = new TypeRef(in.readString(null))
override def encodeValue(x: TypeRef, out: JsonWriter): Unit = our.writeVal(x.name)
override def nullValue: TypeRef = null
}
or derived automatically:
implicit val codecOfTypeRef: JsonValueCodec[TypeRef] =
JsonCodecMaker.make(CodecMakerConfig.withInlineOneValueClasses(true))
The codec for MetricData
can be auto-derived too, just need to add a custom codec for Any
values:
implicit val codecOfAny: JsonValueCodec[Any] = new JsonValueCodec[Any] {
override def decodeValue(in: JsonReader, default: Any): Any = {
val t = in.nextToken()
if (t == 't' || t == 'f') {
in.rollbackToken()
in.readBoolean()
} else if (t >= '0' && t <= '9' || t == '-') {
in.rollbackToken()
in.readDouble()
} else if (t == '\"') {
in.rollbackToken()
in.readString(null)
} else {
in.readNullOrError(default, "expected boolean, numeric, string, or null values")
}
}
override def encodeValue(x: Any, out: JsonWriter): Unit =
x match {
case b: Boolean => out.writeVal(b)
case d: Double => out.writeVal(d)
case s: String => out.writeVal(s)
_ => out.writeNull()
}
override def nullValue: Any = null.asInstanceOf[Any]
}
implicit val codecOfMetricData: JsonValueCodec[MetricData] = JsonCodecMaker.make[MetricData]
The better option would be using some sum-type instead of Any
. It could be modeled with a sealed trait and case classes that extends it, or in Scala 3 you can use new enums or union types.
I agree that it can be done, and that looks like a clean solution using the JsonReader API. I wrote something similar to start with. It took a lot of time and some false starts, but I eventually got it to work. I don't think I found readNullOrError
, so your version has one less "else if", which is a tidy improvement.
The version on this page solves a little bit simpler problem, though. I believe this version will decode input like {"data": [[1,2], [3,4]]}
, won't it? To make it a full apples to apples comparison, consider trying a decoder for [[1,2], [3,4]]
.
Either way, I'd encourage you to write the same decoder using JsonStructuredReader rather than JsonReader. I'm wondering if you would agree that, with the help of the wrapper, writing a decoder becomes easier.
Here are some problems the wrapper solves compared to the above code:
-
(allowed) or .
(not allowed).In the fuller version that also decodes the arrays, there are issues with commas and brackets, and with allocating builders to accumulate the data. These go away in JsonStructuredReader as well.
None of these issues prevent a decoder from being implemented, and I know that custom decoders aren't the main thing of the framework. It just seems better if the framework can make this case easier, saving the developer's mental bandwidth for other things. And this is just one example, by the way. I just checked out of curiosity, and the main codebase I work in right now has about 70 custom Spray decoders.
Re a sum type, I agree that it's cleaner that way. Also, there's a similar trade-off for the JSON encoding. In both cases, if the data can be large, you might want to have a more compact encoding even though it's not as clean.
I'm experimenting with Jsoniter-scala, and it has gone well in general.
One thing that can be difficult, though, is custom deserializers. With the original Jsoniter, there is a very nice Iterator API. The web site goes through some of the advantages, and I am finding them to be true in practice. Writing a custom deserializer with the JsonReader interface is tedious and error prone. It's especially worrisome that using the JsonReader interface, I'm not completely sure that I'm validating the input to be correct JSON.
UPDATE: I thought about it some more and have a prototype to propose. What do the maintainers think?
The general idea in this prototypye is to have one way to decode each kind of thing that JSON supports. For primitive types, a method is provided to read and return the value. For arrays and objects, the caller provides callbacks for reading the nested elements.
With this API, I think it is possible to ensure that all decoded objects came from a syntactically valid JSON input stream. Also, this API just looks really convenient to use compared to JsonReader. JsonReader can still exist as an internal API, and it can be directly used by codecs that the macro expands to, but this API looks a lot better for custom codecs written by hand.