Open kilink opened 4 months ago
This suggestion sounds reasonable, but is this rather a theoretical issue, or something you identified as performance issue when using Gson?
I assume this change would mainly help when writing large strings at the beginning of the JSON data. In the other cases it would not change much:
ensureCapacity
might have similar effects as the implicit resizing by append
ensureCapacity
since the buffer is already large enough]
or }
would still cause resizingBut I am not completely sure. It seems the main advantages with your proposed changes would be:
Would probably also make sense to check not only for StringWriter
but also for com.google.gson.internal.Streams.AppendableWriter
then, in case the user provided directly a StringBuilder
or StringBuffer
to Gson.toJson(Object, Appendable)
.
Gson version
Gson 2.10.1
Java / Android version
Happening on JDK21, but presumably is an issue for any JDK, as the relevant code hasn't changed in 10+ years.
Description
Serializing a String to JSON using
toJson
can result in excessive copying of the internal StringBuffer array for certain inputs, in particular for Strings with a length >= 33. This all boils down to how the internal StringBuffer used by StringWriter is initialized, and its resizing behavior: it starts with a capacity of 16, then tries to use double the previous capacity + 2 (or the size of the String being written if it is larger than that). If the capacity is increased only to accommodate the size of the String being written, then the final call to write the closing double quote will trigger another resize.Expected behavior
Avoid excessive allocations / copying as much as possible. JsonWriter's string method could check if the Writer is a StringWriter, and call ensureCapacity on its StringBuffer, to avoid the excessive resizing.
Obviously the above really only works for Strings that don't require escaping, but that may even be sufficient for most scenarios.
Actual behavior
Reproduction steps
It's easier to look at a trivial case where the input being serialized does not require any escaping:
The above resulted in three resizes of the internal array, the last of which being the most excessive (35 -> 75 for a single character).