While looking at the performance of a large Rails application, I saw that we were creating new CallTarget on each call to String#%. The generic specialization compiles the sprintf expression from scratch on each call, resulting in the creation of a new call target.
The sprintf code can be invoked in a few different ways, but the one that stood out to me was String#%. Rails will create a request ID for each request to make tracking logs and such easier. The ActiveSupport code for creating the UUID uses a static format string, but the sprintf nodes are already megamorphic by the time this code is called. I saw it go megamorphic by loading the URI library.
I suspect format strings are mostly static and splitting would make most call sites monomorphic. This simple change demonstrably splits with the following example:
def foo(format)
format % [123]
end
loop do
foo "%#{rand(3)}d"
foo "%s"
end
For call sites with > 3 format strings we could add a global cache, like we do for regular expressions. That could cut down on the creation of unique call targets, but that's out of scope for this PR and I don't have any evidence of it being a real world problem at the moment.
While looking at the performance of a large Rails application, I saw that we were creating new
CallTarget
on each call toString#%
. The generic specialization compiles thesprintf
expression from scratch on each call, resulting in the creation of a new call target.The
sprintf
code can be invoked in a few different ways, but the one that stood out to me wasString#%
. Rails will create a request ID for each request to make tracking logs and such easier. The ActiveSupport code for creating the UUID uses a static format string, but thesprintf
nodes are already megamorphic by the time this code is called. I saw it go megamorphic by loading the URI library.I suspect format strings are mostly static and splitting would make most call sites monomorphic. This simple change demonstrably splits with the following example:
For call sites with > 3 format strings we could add a global cache, like we do for regular expressions. That could cut down on the creation of unique call targets, but that's out of scope for this PR and I don't have any evidence of it being a real world problem at the moment.