Improve toString Performance: Use StringBuilderWriter for toString methods

Simulant87 commented 9 months ago

Implementation for https://github.com/stleary/JSON-java/issues/863

Using a StringBuilder wrapped in a new StringBuilderWriter class to improve the performance for the default toString methods.

using the not synchronised StringBuilder instead of the StringBuffer results in a ~37% faster toString method, which is mostly used for serialisation, so this improvement is valuable for any service communication receiving JSON from this library.

I validated the performance with this project: https://github.com/fabienrenaud/java-json-benchmark ./run ser --libs ORGJSON

Here is the result for the benchmark on my local machine (thoughput measured in operations per second, higher is better):

currently latest library version 20240205:

Benchmark               Mode  Cnt       Score      Error  Units
Serialization.orgjson  thrpt   20  741310,103 ▒ 6250,938  ops/s

Improved implementation from this PR:

Benchmark               Mode  Cnt        Score      Error  Units
Serialization.orgjson  thrpt   20  1019863,958 ▒ 2538,416  ops/s

stleary commented 8 months ago

@Simulant87 The benchmarks look good, but I tried testing against a 30m JSON file. The improvement was less than 2% between the latest master branch code (943 ms) and this PR (927 ms). I don't think that is enough to justify accepting this PR. If you want to run the test case yourself, I used the file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset

Simulant87 commented 8 months ago

@stleary Thank you for having a look at my PR and sharing your source file for your performance test. I repeated the performance test with your input file and wrote this "poor mans performance test" (see below). It was just faster to implement than a correct JMH setup and it should still give an approximate feelingabout the difference. I executed it on master and my branch with the following result:

master branch: 84.533ms for 200 iterations of toString. 422ms per toString on average. my branch: 54.706ms for 200 iterations of toString. 273ms per toString on average.

the improvement is comparable to the java-json-benchmark I used before with ~35% improvement. I would be interested in the execution result in your environment.

package org.json;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;

public class Main {

    public static void main(String[] args) throws FileNotFoundException {
        // TODO: update path to 30mb input file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
        InputStream inputStream = new FileInputStream("/path/to/train-v1.1.json");
        JSONTokener tokener = new JSONTokener(inputStream);
        JSONObject object = new JSONObject(tokener);

        String output = null;
        // JVM warm up: 100 iterations of toString
        for (int i = 0; i < 100; i++) {
            output = object.toString();
            System.out.println(output.charAt(0));
        }

        // measured run: 200 iterations of toString
        long start = System.currentTimeMillis();
        int iterations = 200;
        for (int i = 0; i < iterations; i++) {
            output = object.toString();
            // access the to string output, so it does not get optimised away
            System.out.println(output.charAt(0));
        }
        long end = System.currentTimeMillis();

        long totalTime = end - start;
        System.out.println(totalTime + "ms for " + iterations + " iterations of toString. " + totalTime / iterations + "ms per toString on average.");
    }
}

stleary commented 8 months ago

@Simulant87 I think that measuring how fast toString() runs in isolation does not provide useful information for users' applications. The test I am starting with is: "What is the performance improvement when parsing a moderately large/complex JSON doc?".

@Test
public void foo() {
    final InputStream resourceAsStream = JSONObjectTest.class.getClassLoader().getResourceAsStream(
            "train-v1.1.json");
    JSONTokener tokener = new JSONTokener(resourceAsStream);
    long startTime = System.currentTimeMillis();
    JSONObject jsonObject = new JSONObject(tokener);
    long endTime = System.currentTimeMillis();
    System.out.println("Time to execute in ms: " + Long.toString(endTime-startTime));
}

Simulant87 commented 8 months ago

@stleary You are right, in my performance improvement I only focused on the toString() method, with a noticeable improvement. My use case it that my application creates JSONObjects as internal model and serialises them as response to a REST request. The time to create JSONObject will be the same, but the time to write the response will be improved, as there the toString() method is used (at least in my context).

Simulant87 commented 8 months ago

@stleary I also took a look into the deserialisation parsing a JSONObject from String. I was able to pull of a ~30% improvement by replacing switch-case statements with if-else statements and don't use the back() method for the JSONObject parsing.

Performance checks I did with java-json-benchmark ./run deser --libs ORGJSON

Here is the result for the benchmark on my local machine (thoughput measured in operations per second, higher is better): currently latest library version 20240205:

Benchmark                 Mode  Cnt       Score       Error  Units
Deserialization.orgjson  thrpt   20  747134,406 ▒ 14445,770  ops/s

improvement on this branch:

Benchmark               Mode  Cnt        Score       Error  Units
Serialization.orgjson  thrpt   20  1062998,974 ▒ 12164,706  ops/s

Also doing my simple performance test on your large JSON input file shows an improvement, although it is not that large as on the benchmark above:

on master branch: 76.403ms for 200 iterations of parsing JSONObject. 382ms per parsing on average.

Improvement on my branch: 70.433ms for 200 iterations of parsing JSONObject. 352ms per parsing on average.

    public static void main(String[] args) throws FileNotFoundException {
        // TODO: update path to 30mb input file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
        InputStream inputStream = new FileInputStream("/path/to/train-v1.1.json");
        JSONTokener tokener = new JSONTokener(inputStream);
        JSONObject object = new JSONObject(tokener);

        // JVM warm up: 100 iterations of toString
        for (int i = 0; i < 100; i++) {
            inputStream = new FileInputStream("/path/to/train-v1.1.json");
            tokener = new JSONTokener(inputStream);
            object = new JSONObject(tokener);
        }

        // measured run: 200 iterations of toString
        long start = System.currentTimeMillis();
        int iterations = 200;
        for (int i = 0; i < iterations; i++) {
            inputStream = new FileInputStream("/path/to/train-v1.1.json");
            tokener = new JSONTokener(inputStream);
            object = new JSONObject(tokener);
        }
        long end = System.currentTimeMillis();

        long totalTime = end - start;
        System.out.println(totalTime + "ms for " + iterations + " iterations of parsing JSONObject. " + totalTime / iterations + "ms per parsing on average.");
    }

stleary commented 8 months ago

What problem does this code solve? This is a performance optimization. StringBuilderWriter (uses a StringBuilder) replaces the built-in class StringWriter (uses a StringBuffer). Large JSON docs that contain many strings can be stringified in about half the time with this enhancement.

Does the code still compile with Java6? Yes

Risks Low

Changes to the API? No

Will this require a new release? No

Should the documentation be updated? No

Does it break the unit tests? No, new unit tests were added.

Was any code refactored in this commit? No

Review status APPROVED

Starting 3-day comment window

stleary / JSON-java

Improve toString Performance: Use StringBuilderWriter for toString methods #867