Denial of service when parsing JSON object with keys that have the same hash code

plokhotnyuk commented 6 years ago

Play JSON Version (2.5.x / etc)

2.7.0-M1

API (Scala / Java / Neither / Both)

Scala 2.12.7

Operating System (Ubuntu 15.10 / MacOS 10.10 / Windows 10)

Ubuntu 16.04

JDK (Oracle 1.8.0_72, OpenJDK 1.8.x, Azul Zing)

Oracle JDK 11

Library Dependencies

none

Expected Behavior

Sub-linear decreasing of throughput when number of JSON object fields (with keys that have the same hash code) is increasing

Actual Behavior

Sub-quadratic decreasing of throughput when number of JSON object fields (with keys that have the same hash code) is increasing

On contemporary CPUs parsing of such JSON object (with a sequence of 100000 fields like below that is ~1.4Mb) can took more than 100 seconds:

{
"!!sjyehe":"",
"!!sjyeiF":"",
"!!sjyej'":"",
"!!sjyfIe":"",
"!!sjyfJF":"",
...
}

Below are results of benchmark (where size is a number of such fields) for different JSON parser including Play-JSON:

[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                    (size)   Mode  Cnt         Score        Error  Units
[info] ExtractFieldsBenchmark.readAVSystemGenCodec       1  thrpt    7   4725178.602 ±  34049.473  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec      10  thrpt    7   1271820.062 ±  25118.614  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec     100  thrpt    7    189164.567 ±   4201.388  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec    1000  thrpt    7     23129.424 ±    356.359  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec   10000  thrpt    7      2284.385 ±     55.511  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec  100000  thrpt    7       211.086 ±      6.497  ops/s
[info] ExtractFieldsBenchmark.readCirce                  1  thrpt    7   2043364.236 ±  56358.608  ops/s
[info] ExtractFieldsBenchmark.readCirce                 10  thrpt    7    741540.359 ±  22339.895  ops/s
[info] ExtractFieldsBenchmark.readCirce                100  thrpt    7     56800.441 ±   3749.585  ops/s
[info] ExtractFieldsBenchmark.readCirce               1000  thrpt    7      3473.587 ±     95.707  ops/s
[info] ExtractFieldsBenchmark.readCirce              10000  thrpt    7       342.581 ±     14.951  ops/s
[info] ExtractFieldsBenchmark.readCirce             100000  thrpt    7        24.715 ±      0.943  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava            1  thrpt    7  11536401.201 ± 442678.821  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava           10  thrpt    7   4659488.034 ± 166357.128  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava          100  thrpt    7    620679.771 ±   7458.313  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava         1000  thrpt    7     70241.242 ±    277.802  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava        10000  thrpt    7      7218.554 ±     18.239  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava       100000  thrpt    7       724.267 ±      2.500  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala           1  thrpt    7   2556948.316 ±  75347.543  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala          10  thrpt    7    960522.252 ±  39031.789  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala         100  thrpt    7    138061.630 ±    421.574  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala        1000  thrpt    7     13797.967 ±    110.474  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala       10000  thrpt    7       658.542 ±     24.533  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala      100000  thrpt    7        75.087 ±      0.767  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala          1  thrpt    7  13702459.570 ± 610987.737  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala         10  thrpt    7   3885899.591 ± 125525.752  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala        100  thrpt    7    493155.701 ±   4141.920  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala       1000  thrpt    7     50850.741 ±    188.826  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala      10000  thrpt    7      4897.487 ±     36.497  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala     100000  thrpt    7       491.353 ±      2.692  ops/s
[info] ExtractFieldsBenchmark.readPlayJson               1  thrpt    7    901833.619 ±  48352.210  ops/s
[info] ExtractFieldsBenchmark.readPlayJson              10  thrpt    7    238305.942 ±   8858.682  ops/s
[info] ExtractFieldsBenchmark.readPlayJson             100  thrpt    7      5813.393 ±    262.240  ops/s
[info] ExtractFieldsBenchmark.readPlayJson            1000  thrpt    7        61.414 ±      0.725  ops/s
[info] ExtractFieldsBenchmark.readPlayJson           10000  thrpt    7         0.395 ±      0.078  ops/s
[info] ExtractFieldsBenchmark.readPlayJson          100000  thrpt    7         0.010 ±      0.001  ops/s
[info] ExtractFieldsBenchmark.readUPickle                1  thrpt    7   3618186.431 ±  78356.527  ops/s
[info] ExtractFieldsBenchmark.readUPickle               10  thrpt    7   1111681.901 ±  46416.024  ops/s
[info] ExtractFieldsBenchmark.readUPickle              100  thrpt    7    153742.834 ±   5925.282  ops/s
[info] ExtractFieldsBenchmark.readUPickle             1000  thrpt    7     16085.153 ±    373.830  ops/s
[info] ExtractFieldsBenchmark.readUPickle            10000  thrpt    7      1596.814 ±     53.993  ops/s
[info] ExtractFieldsBenchmark.readUPickle           100000  thrpt    7       159.317 ±      5.089  ops/s

Reproducible Test Case

To run that benchmarks on your JDK:

Install latest version of sbt and/or ensure that it already installed properly:
```
sbt about
```

Clone jsoniter-scala repo:

git clone https://github.com/plokhotnyuk/jsoniter-scala.git

Enter to the cloned directory and checkout for the specific commit:
```
cd jsoniter-scala
git checkout bb4837d
```

Run benchmarks using a path parameter to your JDK:

sbt -no-colors 'jsoniter-scala-benchmark/jmh:run -jvm /usr/lib/jvm/jdk-11/bin/java -wi 3 -i 7 .*ExtractFieldsBench.*'

jroper commented 6 years ago

Here's the same issue on Jackson:

https://github.com/FasterXML/jackson-databind/issues/37

According to that, it was fixed by something that JDK7/JDK8 did to its HashMap implementation, but it doesn't say what. We're not using a JDK HashMap, we're using Scala's immutable HashMap which I think is a trie map that uses the hash code.

So potentially we could switch to a JDK HashMap and this would fix it, but I suspect that will adversely impact performance when generating JSON AST, since the JDK HashMap doesn't support efficient persistent add operations, for every field added we would have to copy the map.

Other than that we'll have to research to find out what the JDK did to fix this, and either port that to Scala's map, or implement our own trie map that provides the fix. If it can be done simply by providing a custom hashcode function, then we might also be able to achieve it by wrapping the keys, though this would generate more garbage.

jroper commented 6 years ago

Ok, here's the supposed JDK fix:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010238.html

The thing is, I had a look at the fixes proposed, and I can't find any evidence that those fixes exist in the JDK today.

plokhotnyuk commented 6 years ago

Here is how it works in the latest JDK: https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/f5386f0d6182d877e847ed15d197a98bf8002523/src/java.base/share/classes/java/util/HashMap.java#L145-L148

jroper commented 6 years ago

Ah, found it, Doug Lea proposed an alternative and much more elegant solution:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-June/010425.html

This is in the JDK today and documented in the HashMap javadocs:

Note that using many keys with the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

Also from the HashMap source code:

    /*
     * Implementation notes.
     *
     * This map usually acts as a binned (bucketed) hash table, but
     * when bins get too large, they are transformed into bins of
     * TreeNodes, each structured similarly to those in
     * java.util.TreeMap. Most methods try to use normal bins, but
     * relay to TreeNode methods when applicable (simply by checking
     * instanceof a node).  Bins of TreeNodes may be traversed and
     * used like any others, but additionally support faster lookup
     * when overpopulated. However, since the vast majority of bins in
     * normal use are not overpopulated, checking for existence of
     * tree bins may be delayed in the course of table methods.
     *
     * Tree bins (i.e., bins whose elements are all TreeNodes) are
     * ordered primarily by hashCode, but in the case of ties, if two
     * elements are of the same "class C implements Comparable<C>",
     * type then their compareTo method is used for ordering. (We
     * conservatively check generic types via reflection to validate
     * this -- see method comparableClassFor).  The added complexity
     * of tree bins is worthwhile in providing worst-case O(log n)
     * operations when keys either have distinct hashes or are
     * orderable, Thus, performance degrades gracefully under
     * accidental or malicious usages in which hashCode() methods
     * return values that are poorly distributed, as well as those in
     * which many keys share a hashCode, so long as they are also
     * Comparable. (If neither of these apply, we may waste about a
     * factor of two in time and space compared to taking no
     * precautions. But the only known cases stem from poor user
     * programming practices that are already so slow that this makes
     * little difference.)
     *
     * Because TreeNodes are about twice the size of regular nodes, we
     * use them only when bins contain enough nodes to warrant use
     * (see TREEIFY_THRESHOLD). And when they become too small (due to
     * removal or resizing) they are converted back to plain bins.

I think we can solve this in a somewhat similar manner in play-json's JsObject. We have an advantage here in that we are always working with String, so it is always Comparable, so no need to do any reflection hacks. We also can make some assumptions about numbers that will be encountered in the real world - most JSON structures won't have more than 100 fields. So my suggestion is that we use the default Scala scala.collection.immutable.Map (hash trie) for objects with 100 fields or less, and revert to a scala.collection.immutable.TreeMap for objects with more than 100 fields. I think this won't achieve the expected behaviour above (sub linear decreasing of throughput), rather it will achieve sub n log n decreasing of throughput, but that's a lot more graceful than sub quadratic, and it's also the same performance degradation that is seen in the JDK HashMap implementation. It does mean for the rare cases where JSON objects do have 100+ fields that they will also see a performance degradation, perhaps we can tweak that number, but it is only logarithmic operations instead of constant time operations, which is not too bad.

jroper commented 6 years ago

@plokhotnyuk sorry I was in the middle of researching and writing my comment when you posted yours :)

jroper commented 6 years ago

Issue raised to get this fixed in Scala:

https://github.com/scala/bug/issues/11203

playframework / play-json