Open eregon opened 2 years ago
Link to Ruby feature request for byte arrays
I have taken another look at this.
To repro, clone prawn, bundle
and then bundle exec rspec spec/prawn/images/png_spec.rb
.
To profile: TRUFFLERUBYOPT=--cpusampler=flamegraph bundle exec rspec spec/prawn/images/png_spec.rb
The whole test suite takes 80.61s locally which seems reasonable with truffleruby-dev
.
That specific spec takes 37.44 seconds, so it's not solved yet.
Taking a look at the profile, byteslice
is no longer part of it but now there is these TruffleRuby.synchronized
calls which were added to make StringIO thread-safe and might be some overhead.
I also tried on TruffleRuby native 22.3 which doesn't have the TruffleRuby.synchronized
but has TruffleString and that runs the spec in 33.62 seconds, so the synchronization doesn't seem the main overhead here.
CRuby 3.1 takes <1 second for that spec.
I think the perf problem is due to that code using StringIO#putc
and that in TruffleRuby (for master at least) just delegates to write
which then goes into the pos < str.bytesize
case and uses Primitive.string_splice
and that creates two lazy Concat strings which is very expensive due to lots of allocations for single bytes like that.
I think what we want is to use setbyte
or equivalent logic for such a case, so go to the MutableTruffleString and then can be efficient and avoid allocations.
There are also StringIO#write
calls though in that method so we also need to handle that better, i.e., Primitive.string_splice
needs to avoid doing lazy Concat strings when it gets a MutableTruffleString and writes a string of N bytes with byte_count_to_replace=N.
It seems best to fix Primitive.string_splice
first as that should also cover putc
without needing special logic in putc.
From https://github.com/prawnpdf/prawn/pull/1246#issuecomment-1040197295:
From the flamegraph it seems most of time (31.2% using
Search
) is spent inPrawn::Images::PNG#split_image_data
and specifically inStringIO#read
andString#byteslice
called from there. The main cost is probably scanning the substring to compute the correct coderange (7-bit or valid for BINARY strings). This should improve when TruffleRuby has adopted TruffleString (WIP). It also indicates that in this case it might be better to compute the coderange lazily, since it's one of the rare cases where the coderange doesn't seem needed (forcolor.write data.read(color_bytes)
), cc @djoooooe.From a wider perspective it's an unfortunate case of Ruby not having a real "byte array" structure, and String being a relatively poor fit for it.