mdedetrich / scalajson

ScalaJSON - JSON for Scala, currently contains minimal AST
BSD 3-Clause "New" or "Revised" License
55 stars 10 forks source link

Change Map to a VectorMap implementation #20

Open mdedetrich opened 7 years ago

mdedetrich commented 7 years ago

Not 100% decided on this, here are the tradeoffs (and finally to remove confusion).

In javascript/ruby (and other mutable by default languages), the HashMap structure preserves ordering on insertion. In Scala, the immutable Map preserves insertion ordering on creation (via the usage of .newBuilder or by direct constructor) however if you use the various immutable copy with new value methods (i.e. +/-) then there is no guarantee of ordering. This means that in typical Scala use, we do maintain ordering because very few people in practise update + copy the immutable map in a way thats noticable in regards to JSON serialization.

Using immutable Map

Pros

Using VectorMap (see https://github.com/mdedetrich/scalajson/pull/19)

Pros

Using mutable LinkedHashMap

Pros

@Ichoran your thoughts?

Ichoran commented 7 years ago

I think it really depends how much slower VectorMap is than a normal immutable map. In general you might have a lot of JSON objects, so anything that is too heavyweight should probably be rejected. What does Circe do?

mdedetrich commented 7 years ago

@Ichoran Circe uses a normal map according to @travisbrown. Argonaut, which is scalaz's version of Circe, uses this implementation (see https://github.com/argonaut-io/argonaut/blob/master/argonaut/shared/src/main/scala/argonaut/JsonObject.scala). I will add some benchmarks and do some more work to see what the tradeoffs are. With specialization of classes the slowdown in typical cases shouldn't be that high

travisbrown commented 7 years ago

For what it's worth I've been working on a branch in circe where I use java.util.LinkedHashMap by default and switch to the Map + IndexedSeq representation when someone uses add or remove, and it seems to speed things up significantly.

mdedetrich commented 7 years ago

@travisbrown Mind linking the branch? Also how do you deal with concurrency/threading issues (or I assume you just hide it internally and its not exposed). This approach also uses the Map + IndexedSeq approach, however I use specializing for classes up to N to speed it up for typical uses cases (i.e. objects smaller than n)

travisbrown commented 7 years ago

It's still a mess and I haven't pushed it yet. But yeah, the LinkedHashMap is never exposed—it's just used in the cases where you're building up a map from a bunch of key-value pairs all at once.

gmethvin commented 7 years ago

We use LinkedHashMap in play-json but only expose Map. Thus it is possible to use a more efficient Map implementation if you don't care about ordering.

If this library intends to be the API actually used by end-users, it would be nice if it provided that flexibility.

mdedetrich commented 7 years ago

@gmethvin I think the goal is going to be that we will see how VectorMap works out. If its very competitive with Map then this implementation is whats going to be used, else we will do similar to what PlayJson does.

mdedetrich commented 7 years ago

@Ichoran I have made a seperate repo for the vector map (now called LinkedMap) implementation at https://github.com/mdedetrich/linked-map. With some very rudimentary benchmarks, it appears that LinkedMap is a fair bit slower than a standard Map (see https://github.com/mdedetrich/linked-map/issues/2 for some workarounds)