Closed Simulant87 closed 8 months ago
@Simulant87 The benchmarks look good, but I tried testing against a 30m JSON file. The improvement was less than 2% between the latest master branch code (943 ms) and this PR (927 ms). I don't think that is enough to justify accepting this PR. If you want to run the test case yourself, I used the file train-v1.1.json
from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
@stleary Thank you for having a look at my PR and sharing your source file for your performance test. I repeated the performance test with your input file and wrote this "poor mans performance test" (see below). It was just faster to implement than a correct JMH setup and it should still give an approximate feelingabout the difference. I executed it on master and my branch with the following result:
master branch: 84.533ms for 200 iterations of toString. 422ms per toString on average. my branch: 54.706ms for 200 iterations of toString. 273ms per toString on average.
the improvement is comparable to the java-json-benchmark I used before with ~35% improvement. I would be interested in the execution result in your environment.
package org.json;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.InputStream;
public class Main {
public static void main(String[] args) throws FileNotFoundException {
// TODO: update path to 30mb input file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
InputStream inputStream = new FileInputStream("/path/to/train-v1.1.json");
JSONTokener tokener = new JSONTokener(inputStream);
JSONObject object = new JSONObject(tokener);
String output = null;
// JVM warm up: 100 iterations of toString
for (int i = 0; i < 100; i++) {
output = object.toString();
System.out.println(output.charAt(0));
}
// measured run: 200 iterations of toString
long start = System.currentTimeMillis();
int iterations = 200;
for (int i = 0; i < iterations; i++) {
output = object.toString();
// access the to string output, so it does not get optimised away
System.out.println(output.charAt(0));
}
long end = System.currentTimeMillis();
long totalTime = end - start;
System.out.println(totalTime + "ms for " + iterations + " iterations of toString. " + totalTime / iterations + "ms per toString on average.");
}
}
@Simulant87 I think that measuring how fast toString()
runs in isolation does not provide useful information for users' applications. The test I am starting with is: "What is the performance improvement when parsing a moderately large/complex JSON doc?".
@Test
public void foo() {
final InputStream resourceAsStream = JSONObjectTest.class.getClassLoader().getResourceAsStream(
"train-v1.1.json");
JSONTokener tokener = new JSONTokener(resourceAsStream);
long startTime = System.currentTimeMillis();
JSONObject jsonObject = new JSONObject(tokener);
long endTime = System.currentTimeMillis();
System.out.println("Time to execute in ms: " + Long.toString(endTime-startTime));
}
@stleary You are right, in my performance improvement I only focused on the toString()
method, with a noticeable improvement. My use case it that my application creates JSONObjects as internal model and serialises them as response to a REST request. The time to create JSONObject will be the same, but the time to write the response will be improved, as there the toString()
method is used (at least in my context).
@stleary I also took a look into the deserialisation parsing a JSONObject from String. I was able to pull of a ~30% improvement by replacing switch-case
statements with if-else
statements and don't use the back()
method for the JSONObject parsing.
Performance checks I did with java-json-benchmark
./run deser --libs ORGJSON
Here is the result for the benchmark on my local machine (thoughput measured in operations per second, higher is better): currently latest library version 20240205:
Benchmark Mode Cnt Score Error Units
Deserialization.orgjson thrpt 20 747134,406 ▒ 14445,770 ops/s
improvement on this branch:
Benchmark Mode Cnt Score Error Units
Serialization.orgjson thrpt 20 1062998,974 ▒ 12164,706 ops/s
Also doing my simple performance test on your large JSON input file shows an improvement, although it is not that large as on the benchmark above:
on master branch: 76.403ms for 200 iterations of parsing JSONObject. 382ms per parsing on average.
Improvement on my branch: 70.433ms for 200 iterations of parsing JSONObject. 352ms per parsing on average.
public static void main(String[] args) throws FileNotFoundException {
// TODO: update path to 30mb input file train-v1.1.json from https://www.kaggle.com/datasets/stanfordu/stanford-question-answering-dataset
InputStream inputStream = new FileInputStream("/path/to/train-v1.1.json");
JSONTokener tokener = new JSONTokener(inputStream);
JSONObject object = new JSONObject(tokener);
// JVM warm up: 100 iterations of toString
for (int i = 0; i < 100; i++) {
inputStream = new FileInputStream("/path/to/train-v1.1.json");
tokener = new JSONTokener(inputStream);
object = new JSONObject(tokener);
}
// measured run: 200 iterations of toString
long start = System.currentTimeMillis();
int iterations = 200;
for (int i = 0; i < iterations; i++) {
inputStream = new FileInputStream("/path/to/train-v1.1.json");
tokener = new JSONTokener(inputStream);
object = new JSONObject(tokener);
}
long end = System.currentTimeMillis();
long totalTime = end - start;
System.out.println(totalTime + "ms for " + iterations + " iterations of parsing JSONObject. " + totalTime / iterations + "ms per parsing on average.");
}
What problem does this code solve? This is a performance optimization. StringBuilderWriter (uses a StringBuilder) replaces the built-in class StringWriter (uses a StringBuffer). Large JSON docs that contain many strings can be stringified in about half the time with this enhancement.
Does the code still compile with Java6? Yes
Risks Low
Changes to the API? No
Will this require a new release? No
Should the documentation be updated? No
Does it break the unit tests? No, new unit tests were added.
Was any code refactored in this commit? No
Review status APPROVED
Starting 3-day comment window
Implementation for https://github.com/stleary/JSON-java/issues/863
Using a
StringBuilder
wrapped in a newStringBuilderWriter
class to improve the performance for the defaulttoString
methods.using the not synchronised
StringBuilder
instead of theStringBuffer
results in a ~37% fastertoString
method, which is mostly used for serialisation, so this improvement is valuable for any service communication receiving JSON from this library.I validated the performance with this project: https://github.com/fabienrenaud/java-json-benchmark
./run ser --libs ORGJSON
Here is the result for the benchmark on my local machine (thoughput measured in operations per second, higher is better):
currently latest library version 20240205:
Improved implementation from this PR: