Open jam01 opened 1 week ago
@jam01 Hi, Jose! Thanks for opening the discussion!
Yes, it would be great to have your visitor implementation to be published.
If you would like to contribute here then jsoniter-scala-circe
module could be an example sub-project how to organise it.
@jam01 Please see latest changes in the related branch: https://github.com/plokhotnyuk/jsoniter-scala/tree/upickle-visitor
Next steps:
scalatest
/scalacheck
for better coverage on all Scala platforms (JVM, Scala.js, Scala Native)upickleJsoniter
benchmarks in the jsoniter-scala-benchmarkJVM
and jsoniter-scala-benchmarkJS
modulesHere are results of benchmarks for parsing and serialization of GeoJSON messages on i7-11700
using JDK-21:
[info] Benchmark Mode Cnt Score Error Units
[info] GeoJSONReading.borer thrpt 5 37439.639 ± 213.404 ops/s
[info] GeoJSONReading.borer:gc.alloc.rate thrpt 5 1894.613 ± 10.929 MB/sec
[info] GeoJSONReading.borer:gc.alloc.rate.norm thrpt 5 53072.156 ± 0.001 B/op
[info] GeoJSONReading.borer:gc.count thrpt 5 13.000 counts
[info] GeoJSONReading.borer:gc.time thrpt 5 8.000 ms
[info] GeoJSONReading.circe thrpt 5 16274.767 ± 116.223 ops/s
[info] GeoJSONReading.circe:gc.alloc.rate thrpt 5 7009.305 ± 49.569 MB/sec
[info] GeoJSONReading.circe:gc.alloc.rate.norm thrpt 5 451712.728 ± 112.268 B/op
[info] GeoJSONReading.circe:gc.count thrpt 5 46.000 counts
[info] GeoJSONReading.circe:gc.time thrpt 5 15.000 ms
[info] GeoJSONReading.circeJsoniter thrpt 5 22225.693 ± 206.976 ops/s
[info] GeoJSONReading.circeJsoniter:gc.alloc.rate thrpt 5 7343.901 ± 68.614 MB/sec
[info] GeoJSONReading.circeJsoniter:gc.alloc.rate.norm thrpt 5 346536.263 ± 0.004 B/op
[info] GeoJSONReading.circeJsoniter:gc.count thrpt 5 48.000 counts
[info] GeoJSONReading.circeJsoniter:gc.time thrpt 5 15.000 ms
[info] GeoJSONReading.jacksonScala thrpt 5 13417.721 ± 159.514 ops/s
[info] GeoJSONReading.jacksonScala:gc.alloc.rate thrpt 5 3812.536 ± 45.129 MB/sec
[info] GeoJSONReading.jacksonScala:gc.alloc.rate.norm thrpt 5 297991.701 ± 4.769 B/op
[info] GeoJSONReading.jacksonScala:gc.count thrpt 5 25.000 counts
[info] GeoJSONReading.jacksonScala:gc.time thrpt 5 8.000 ms
[info] GeoJSONReading.jsoniterScala thrpt 5 80401.024 ± 460.327 ops/s
[info] GeoJSONReading.jsoniterScala:gc.alloc.rate thrpt 5 1963.173 ± 11.156 MB/sec
[info] GeoJSONReading.jsoniterScala:gc.alloc.rate.norm thrpt 5 25608.073 ± 0.001 B/op
[info] GeoJSONReading.jsoniterScala:gc.count thrpt 5 13.000 counts
[info] GeoJSONReading.jsoniterScala:gc.time thrpt 5 6.000 ms
[info] GeoJSONReading.playJson thrpt 5 6575.015 ± 107.294 ops/s
[info] GeoJSONReading.playJson:gc.alloc.rate thrpt 5 4025.743 ± 65.638 MB/sec
[info] GeoJSONReading.playJson:gc.alloc.rate.norm thrpt 5 642155.855 ± 25.546 B/op
[info] GeoJSONReading.playJson:gc.count thrpt 5 27.000 counts
[info] GeoJSONReading.playJson:gc.time thrpt 5 11.000 ms
[info] GeoJSONReading.playJsonJsoniter thrpt 5 23309.493 ± 319.014 ops/s
[info] GeoJSONReading.playJsonJsoniter:gc.alloc.rate thrpt 5 7459.233 ± 102.446 MB/sec
[info] GeoJSONReading.playJsonJsoniter:gc.alloc.rate.norm thrpt 5 335616.933 ± 5.873 B/op
[info] GeoJSONReading.playJsonJsoniter:gc.count thrpt 5 48.000 counts
[info] GeoJSONReading.playJsonJsoniter:gc.time thrpt 5 14.000 ms
[info] GeoJSONReading.smithy4sJson thrpt 5 38075.338 ± 536.969 ops/s
[info] GeoJSONReading.smithy4sJson:gc.alloc.rate thrpt 5 2900.770 ± 41.059 MB/sec
[info] GeoJSONReading.smithy4sJson:gc.alloc.rate.norm thrpt 5 79904.153 ± 0.003 B/op
[info] GeoJSONReading.smithy4sJson:gc.count thrpt 5 19.000 counts
[info] GeoJSONReading.smithy4sJson:gc.time thrpt 5 7.000 ms
[info] GeoJSONReading.sprayJson thrpt 5 14543.322 ± 215.176 ops/s
[info] GeoJSONReading.sprayJson:gc.alloc.rate thrpt 5 6469.027 ± 96.496 MB/sec
[info] GeoJSONReading.sprayJson:gc.alloc.rate.norm thrpt 5 466520.403 ± 0.017 B/op
[info] GeoJSONReading.sprayJson:gc.count thrpt 5 43.000 counts
[info] GeoJSONReading.sprayJson:gc.time thrpt 5 12.000 ms
[info] GeoJSONReading.uPickle thrpt 5 15598.641 ± 366.882 ops/s
[info] GeoJSONReading.uPickle:gc.alloc.rate thrpt 5 3265.721 ± 76.655 MB/sec
[info] GeoJSONReading.uPickle:gc.alloc.rate.norm thrpt 5 219567.847 ± 4.562 B/op
[info] GeoJSONReading.uPickle:gc.count thrpt 5 21.000 counts
[info] GeoJSONReading.uPickle:gc.time thrpt 5 5.000 ms
[info] GeoJSONReading.uPickleJsoniter thrpt 5 37324.097 ± 265.722 ops/s
[info] GeoJSONReading.uPickleJsoniter:gc.alloc.rate thrpt 5 2663.730 ± 17.144 MB/sec
[info] GeoJSONReading.uPickleJsoniter:gc.alloc.rate.norm thrpt 5 74856.157 ± 0.004 B/op
[info] GeoJSONReading.uPickleJsoniter:gc.count thrpt 5 18.000 counts
[info] GeoJSONReading.uPickleJsoniter:gc.time thrpt 5 5.000 ms
[info] GeoJSONReading.weePickle thrpt 5 16989.353 ± 159.588 ops/s
[info] GeoJSONReading.weePickle:gc.alloc.rate thrpt 5 3347.992 ± 32.243 MB/sec
[info] GeoJSONReading.weePickle:gc.alloc.rate.norm thrpt 5 206680.344 ± 0.011 B/op
[info] GeoJSONReading.weePickle:gc.count thrpt 5 22.000 counts
[info] GeoJSONReading.weePickle:gc.time thrpt 5 6.000 ms
[info] GeoJSONReading.zioJson thrpt 5 8321.304 ± 59.731 ops/s
[info] GeoJSONReading.zioJson:gc.alloc.rate thrpt 5 3045.912 ± 21.390 MB/sec
[info] GeoJSONReading.zioJson:gc.alloc.rate.norm thrpt 5 383888.720 ± 0.161 B/op
[info] GeoJSONReading.zioJson:gc.count thrpt 5 20.000 counts
[info] GeoJSONReading.zioJson:gc.time thrpt 5 7.000 ms
[info] GeoJSONWriting.borer thrpt 5 20812.620 ± 941.843 ops/s
[info] GeoJSONWriting.borer:gc.alloc.rate thrpt 5 2457.086 ± 111.317 MB/sec
[info] GeoJSONWriting.borer:gc.alloc.rate.norm thrpt 5 123812.580 ± 37.005 B/op
[info] GeoJSONWriting.borer:gc.count thrpt 5 16.000 counts
[info] GeoJSONWriting.borer:gc.time thrpt 5 5.000 ms
[info] GeoJSONWriting.circe thrpt 5 16843.402 ± 165.091 ops/s
[info] GeoJSONWriting.circe:gc.alloc.rate thrpt 5 4215.090 ± 41.449 MB/sec
[info] GeoJSONWriting.circe:gc.alloc.rate.norm thrpt 5 262462.363 ± 49.158 B/op
[info] GeoJSONWriting.circe:gc.count thrpt 5 28.000 counts
[info] GeoJSONWriting.circe:gc.time thrpt 5 9.000 ms
[info] GeoJSONWriting.circeJsoniter thrpt 5 23513.288 ± 90.930 ops/s
[info] GeoJSONWriting.circeJsoniter:gc.alloc.rate thrpt 5 2701.804 ± 10.727 MB/sec
[info] GeoJSONWriting.circeJsoniter:gc.alloc.rate.norm thrpt 5 120511.890 ± 3.356 B/op
[info] GeoJSONWriting.circeJsoniter:gc.count thrpt 5 18.000 counts
[info] GeoJSONWriting.circeJsoniter:gc.time thrpt 5 8.000 ms
[info] GeoJSONWriting.jacksonScala thrpt 5 15061.343 ± 166.967 ops/s
[info] GeoJSONWriting.jacksonScala:gc.alloc.rate thrpt 5 2512.910 ± 27.818 MB/sec
[info] GeoJSONWriting.jacksonScala:gc.alloc.rate.norm thrpt 5 174981.692 ± 1.102 B/op
[info] GeoJSONWriting.jacksonScala:gc.count thrpt 5 17.000 counts
[info] GeoJSONWriting.jacksonScala:gc.time thrpt 5 9.000 ms
[info] GeoJSONWriting.jsoniterScala thrpt 5 38929.883 ± 499.637 ops/s
[info] GeoJSONWriting.jsoniterScala:gc.alloc.rate thrpt 5 428.805 ± 5.513 MB/sec
[info] GeoJSONWriting.jsoniterScala:gc.alloc.rate.norm thrpt 5 11552.149 ± 0.004 B/op
[info] GeoJSONWriting.jsoniterScala:gc.count thrpt 5 3.000 counts
[info] GeoJSONWriting.jsoniterScala:gc.time thrpt 5 4.000 ms
[info] GeoJSONWriting.jsoniterScalaPrealloc thrpt 5 39833.183 ± 903.517 ops/s
[info] GeoJSONWriting.jsoniterScalaPrealloc:gc.alloc.rate thrpt 5 1.829 ± 0.041 MB/sec
[info] GeoJSONWriting.jsoniterScalaPrealloc:gc.alloc.rate.norm thrpt 5 48.146 ± 0.004 B/op
[info] GeoJSONWriting.jsoniterScalaPrealloc:gc.count thrpt 5 ≈ 0 counts
[info] GeoJSONWriting.playJson thrpt 5 4839.541 ± 38.001 ops/s
[info] GeoJSONWriting.playJson:gc.alloc.rate thrpt 5 3963.696 ± 31.054 MB/sec
[info] GeoJSONWriting.playJson:gc.alloc.rate.norm thrpt 5 858967.023 ± 1.941 B/op
[info] GeoJSONWriting.playJson:gc.count thrpt 5 26.000 counts
[info] GeoJSONWriting.playJson:gc.time thrpt 5 9.000 ms
[info] GeoJSONWriting.playJsonJsoniter thrpt 5 9885.058 ± 84.203 ops/s
[info] GeoJSONWriting.playJsonJsoniter:gc.alloc.rate thrpt 5 3692.292 ± 30.171 MB/sec
[info] GeoJSONWriting.playJsonJsoniter:gc.alloc.rate.norm thrpt 5 391730.586 ± 137.937 B/op
[info] GeoJSONWriting.playJsonJsoniter:gc.count thrpt 5 24.000 counts
[info] GeoJSONWriting.playJsonJsoniter:gc.time thrpt 5 11.000 ms
[info] GeoJSONWriting.smithy4sJson thrpt 5 34031.095 ± 394.677 ops/s
[info] GeoJSONWriting.smithy4sJson:gc.alloc.rate thrpt 5 2076.440 ± 24.127 MB/sec
[info] GeoJSONWriting.smithy4sJson:gc.alloc.rate.norm thrpt 5 63992.172 ± 0.002 B/op
[info] GeoJSONWriting.smithy4sJson:gc.count thrpt 5 14.000 counts
[info] GeoJSONWriting.smithy4sJson:gc.time thrpt 5 8.000 ms
[info] GeoJSONWriting.sprayJson thrpt 5 8402.920 ± 355.600 ops/s
[info] GeoJSONWriting.sprayJson:gc.alloc.rate thrpt 5 4645.094 ± 196.506 MB/sec
[info] GeoJSONWriting.sprayJson:gc.alloc.rate.norm thrpt 5 579751.641 ± 154.358 B/op
[info] GeoJSONWriting.sprayJson:gc.count thrpt 5 31.000 counts
[info] GeoJSONWriting.sprayJson:gc.time thrpt 5 14.000 ms
[info] GeoJSONWriting.uPickle thrpt 5 19705.941 ± 101.944 ops/s
[info] GeoJSONWriting.uPickle:gc.alloc.rate thrpt 5 2290.380 ± 11.659 MB/sec
[info] GeoJSONWriting.uPickle:gc.alloc.rate.norm thrpt 5 121896.298 ± 0.012 B/op
[info] GeoJSONWriting.uPickle:gc.count thrpt 5 15.000 counts
[info] GeoJSONWriting.uPickle:gc.time thrpt 5 7.000 ms
[info] GeoJSONWriting.uPickleJsoniter thrpt 5 31789.433 ± 537.276 ops/s
[info] GeoJSONWriting.uPickleJsoniter:gc.alloc.rate thrpt 5 1555.910 ± 25.549 MB/sec
[info] GeoJSONWriting.uPickleJsoniter:gc.alloc.rate.norm thrpt 5 51336.184 ± 0.003 B/op
[info] GeoJSONWriting.uPickleJsoniter:gc.count thrpt 5 11.000 counts
[info] GeoJSONWriting.uPickleJsoniter:gc.time thrpt 5 8.000 ms
[info] GeoJSONWriting.weePickle thrpt 5 14920.927 ± 965.532 ops/s
[info] GeoJSONWriting.weePickle:gc.alloc.rate thrpt 5 2504.639 ± 161.508 MB/sec
[info] GeoJSONWriting.weePickle:gc.alloc.rate.norm thrpt 5 176045.783 ± 47.756 B/op
[info] GeoJSONWriting.weePickle:gc.count thrpt 5 16.000 counts
[info] GeoJSONWriting.weePickle:gc.time thrpt 5 6.000 ms
[info] GeoJSONWriting.zioJson thrpt 5 12763.454 ± 640.770 ops/s
[info] GeoJSONWriting.zioJson:gc.alloc.rate thrpt 5 3249.948 ± 163.562 MB/sec
[info] GeoJSONWriting.zioJson:gc.alloc.rate.norm thrpt 5 267050.538 ± 39.522 B/op
[info] GeoJSONWriting.zioJson:gc.count thrpt 5 22.000 counts
[info] GeoJSONWriting.zioJson:gc.time thrpt 5 8.000 ms
Yeah the changes make sense!
I thought a mark reset was needed, but wasn't 100% sure. I'm also curious why the CI build needed to have ujson.Arr cast as ujson.Value, I tried the same scalac flags in my local and it still worked without the cast.
As far as the numbers I think the really interesting comparison would be against ujson reading benchmarks.
Here are results of benchmarks for the same JSON sample parsed/serialized to/from ujson.Value
(instead of GeoJSON.GeoJSON
) using the same environment:
[info] GeoJSONReading.uJson thrpt 5 16496.703 ± 110.056 ops/s
[info] GeoJSONReading.uJson:gc.alloc.rate thrpt 5 4368.762 ± 29.105 MB/sec
[info] GeoJSONReading.uJson:gc.alloc.rate.norm thrpt 5 277735.761 ± 5.104 B/op
[info] GeoJSONReading.uJson:gc.count thrpt 5 28.000 counts
[info] GeoJSONReading.uJson:gc.time thrpt 5 8.000 ms
[info] GeoJSONReading.uJsonJsoniter thrpt 5 39282.768 ± 162.793 ops/s
[info] GeoJSONReading.uJsonJsoniter:gc.alloc.rate thrpt 5 3827.026 ± 15.917 MB/sec
[info] GeoJSONReading.uJsonJsoniter:gc.alloc.rate.norm thrpt 5 102176.149 ± 0.001 B/op
[info] GeoJSONReading.uJsonJsoniter:gc.count thrpt 5 25.000 counts
[info] GeoJSONReading.uJsonJsoniter:gc.time thrpt 5 7.000 ms
[info] GeoJSONWriting.uJson thrpt 5 17210.123 ± 91.679 ops/s
[info] GeoJSONWriting.uJson:gc.alloc.rate thrpt 5 1358.623 ± 7.286 MB/sec
[info] GeoJSONWriting.uJson:gc.alloc.rate.norm thrpt 5 82792.340 ± 0.011 B/op
[info] GeoJSONWriting.uJson:gc.count thrpt 5 9.000 counts
[info] GeoJSONWriting.uJson:gc.time thrpt 5 8.000 ms
[info] GeoJSONWriting.uJsonJsoniter thrpt 5 29396.857 ± 466.131 ops/s
[info] GeoJSONWriting.uJsonJsoniter:gc.alloc.rate thrpt 5 343.761 ± 5.434 MB/sec
[info] GeoJSONWriting.uJsonJsoniter:gc.alloc.rate.norm thrpt 5 12264.198 ± 0.007 B/op
[info] GeoJSONWriting.uJsonJsoniter:gc.count thrpt 5 2.000 counts
[info] GeoJSONWriting.uJsonJsoniter:gc.time thrpt 5 2.000 ms
Huh that's a big gap
Unfortunately, those results are great because the current implementation use in.readDouble()
. But it is broken because of rounding all floating point numbers that can be rounded to non infinite double
values. Need to create good unit tests before running of benchmarks ;)
From the other side ujson.Value
has only double
as in-memory representation for any numbers, so parsing of them could be simplified to v.visitFloat64(in.readDouble(), -1)
.
Probably if you are interested only in parsing/serialization to/from ujson.Value
then you will be satisfied by the following simplified implementation:
import com.github.plokhotnyuk.jsoniter_scala.core.{JsonReader, JsonValueCodec, JsonWriter}
import upickle.core.{ArrVisitor, ObjVisitor, StringVisitor, Transformer, Visitor}
final class Decoder[J](maxDepth: Int = 32)
(implicit v: Visitor[?, J]) extends JsonValueCodec[J] {
override def nullValue: J = null.asInstanceOf[J]
override def encodeValue(x: J, out: JsonWriter): Unit =
throw new UnsupportedOperationException("Codec only supports decoding")
override def decodeValue(in: JsonReader, default: J): J =
decode(in, maxDepth, v)
private[this] def decode[Z](in: JsonReader, depth: Int, v: Visitor[?, Z]): Z = {
val b = in.nextToken()
if (b == '"') {
in.rollbackToken()
v.visitString(in.readString(null), -1)
} else if (b == 'f' || b == 't') {
in.rollbackToken()
if (in.readBoolean()) v.visitTrue(-1)
else v.visitFalse(-1)
} else if (b >= '0' && b <= '9' || b == '-') {
in.rollbackToken()
v.visitFloat64(in.readDouble(), -1)
} else if (b == '[') {
val depthM1 = depth - 1
if (depthM1 < 0) in.decodeError("depth limit exceeded")
val isEmpty = in.isNextToken(']')
val arrV = v.visitArray(if (isEmpty) 0 else -1, -1).narrow
if (!isEmpty) {
in.rollbackToken()
while ({
arrV.visitValue(decode(in, depthM1, arrV.subVisitor), -1)
in.isNextToken(',')
}) ()
if (!in.isCurrentToken(']')) in.arrayEndOrCommaError()
}
arrV.visitEnd(-1)
} else if (b == '{') {
val depthM1 = depth - 1
if (depthM1 < 0) in.decodeError("depth limit exceeded")
val isEmpty = in.isNextToken('}')
val objV = v.visitObject(if (isEmpty) 0 else -1, jsonableKeys = true, -1).narrow
if (!isEmpty) {
in.rollbackToken()
while ( {
val key = in.readKeyAsString()
objV.visitKeyValue(objV.visitKey(-1).visitString(key, -1))
objV.visitValue(decode(in, depthM1, objV.subVisitor), -1)
in.isNextToken(',')
}) ()
if (!in.isCurrentToken('}')) in.objectEndOrCommaError()
}
objV.visitEnd(-1)
} else in.readNullOrError(v.visitNull(-1), "expected JSON value")
}
}
final class Encoder[I](implicit t: Transformer[I]) extends JsonValueCodec[I] {
override def decodeValue(in: JsonReader, default: I): I =
throw new UnsupportedOperationException("Codec only supports encoding")
override def encodeValue(x: I, out: JsonWriter): Unit = {
val visitor = new Visitor[Any, JsonWriter] {
private[this] val objVisitor = new ObjVisitor[Any, JsonWriter] {
override def visitKey(index: Int): Visitor[?, ?] = StringVisitor
override def visitKeyValue(v: Any): Unit = out.writeKey(v.toString)
override def subVisitor: Visitor[?, ?] = visitor
override def visitValue(v: Any, index: Int): Unit = ()
override def visitEnd(index: Int): JsonWriter = {
out.writeObjectEnd()
out
}
}
private[this] val arrVisitor = new ArrVisitor[Any, JsonWriter] {
override def subVisitor: Visitor[?, ?] = visitor
override def visitValue(v: Any, index: Int): Unit = ()
override def visitEnd(index: Int): JsonWriter = {
out.writeArrayEnd()
out
}
}
override def visitNull(index: Int): JsonWriter = {
out.writeNull()
out
}
override def visitFalse(index: Int): JsonWriter = {
out.writeVal(false)
out
}
override def visitTrue(index: Int): JsonWriter = {
out.writeVal(true)
out
}
override def visitInt64(i: Long, index: Int): JsonWriter = {
out.writeVal(i)
out
}
override def visitFloat64(d: Double, index: Int): JsonWriter = {
out.writeVal(d)
out
}
override def visitFloat64String(s: String, index: Int): JsonWriter = {
out.writeNonEscapedAsciiVal(s)
out
}
override def visitFloat64StringParts(s: CharSequence, decIndex: Int, expIndex: Int, index: Int): JsonWriter = {
out.writeNonEscapedAsciiVal(s.toString)
out
}
override def visitString(s: CharSequence, index: Int): JsonWriter = {
out.writeVal(s.toString)
out
}
override def visitBinary(bytes: Array[Byte], offset: Int, len: Int, index: Int): JsonWriter = {
val trimmed =
if (offset == 0 && bytes.length == len) bytes
else bytes.slice(offset, offset + len)
out.writeBase64Val(trimmed, doPadding = true)
out
}
override def visitArray(length: Int, index: Int): ArrVisitor[Any, JsonWriter] = {
out.writeArrayStart()
arrVisitor
}
override def visitObject(length: Int, jsonableKeys: Boolean, index: Int): ObjVisitor[Any, JsonWriter] = {
out.writeObjectStart()
objVisitor
}
override def visitFloat32(d: Float, index: Int): JsonWriter = {
out.writeVal(d)
out
}
override def visitInt32(i: Int, index: Int): JsonWriter = {
out.writeVal(i)
out
}
override def visitUInt64(i: Long, index: Int): JsonWriter = {
out.writeVal(i)
out
}
override def visitChar(s: Char, index: Int): JsonWriter = {
out.writeVal(s)
out
}
override def visitExt(tag: Byte, bytes: Array[Byte], offset: Int, len: Int, index: Int): JsonWriter =
visitBinary(bytes, offset, len, index)
}
t.transform(x, visitor)
}
override def nullValue: I = null.asInstanceOf[I]
}
Right!
I implemented the reading of discrete int
, long
, float
and double
because that seemed more generic, plus in the other project (for which I ended up creating json-schema and this codec) I'm hoping to support integers and decimals without the double precision constraint (so no ujson). Ideally I'd support Decimal128
actually, but upickle does not facilitate that at the moment.
I do suppose the next performance gain would be for the reader to support parseNum
(as you suggested) where the result is anything between int
and BigDecimal
.
Hey @plokhotnyuk I intended to open a Discussions question, but it's seemingly not enabled.
I implemented a codec for upickle's Visitor. I wanted to see if you had interest in adopting it I'd gladly contribute it.
See: Codec JsonWriter and En/Decoder