oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
2.98k stars 179 forks source link

Add sass-embedded to REUSE_AS_BINARY_ON_TRUFFLERUBY #3565

Closed ntkme closed 1 month ago

ntkme commented 1 month ago

I'm the maintainer of sass-embedded gem. This gem has "native extension" that is a Dart VM binary executable, which is completely independent and not linked to libruby. On the platform "ruby", there is a Rakefile based extension that downloads the correct binary for the current platform, and it is extremely slow on TruffleRuby comparing to MRI (likely due to Rakefile extension launching a child rake process etc). Therefore, it's better to just use the pre-compiled gems that bundles the binary to speed up the installation.

oracle-contributor-agreement[bot] commented 1 month ago

Thank you for your pull request and welcome to our community! To contribute, please sign the Oracle Contributor Agreement (OCA). The following contributors of this PR have not signed the OCA:

To sign the OCA, please create an Oracle account and sign the OCA in Oracle's Contributor Agreement Application.

When signing the OCA, please provide your GitHub username. After signing the OCA and getting an OCA approval from Oracle, this PR will be automatically updated.

If you are an Oracle employee, please make sure that you are a member of the main Oracle GitHub organization, and your membership in this organization is public.

eregon commented 1 month ago

Thank you for the PR! Did you submit the OCA? If so I'll ping people to process it.

On the platform "ruby", there is a Rakefile based extension that downloads the correct binary for the current platform, and it is extremely slow on TruffleRuby comparing to MRI (likely due to Rakefile extension launching a child rake process etc).

Do you have a log of that or some timings? I found https://github.com/sass-contrib/sass-embedded-host-ruby/actions/runs/8978997934/job/24660309742, 27s doesn't seem too bad for the install step (jruby takes approximately the same time), but that's on Native and on JVM it's 85s which is a lot. In general I think it's fine to only test on Native (truffleruby) and not JVM (truffleruby+graalvm) in CI, because the latter is typically much slower in CI and there should be no difference anyway.

Locally gem install sass-embedded takes 29s (first run, also building google-protobuf) & 22s (second run) which is not that fast (for comparison json takes 5s). With your PR, it takes 4s for gem i -V sass-embedded locally for me, a nice gain.

I wonder what takes so long, https://github.com/sass-contrib/sass-embedded-host-ruby/blob/main/ext/sass/Rakefile looks like a couple downloads which shouldn't be that slow. And a single rake subprocess from what I can see. That seems worth investigating & profiling, cc @andrykonchin but either way this PR seems good to merge.

oracle-contributor-agreement[bot] commented 1 month ago

Thank you for signing the OCA.

eregon commented 1 month ago

And a single rake subprocess from what I can see. That seems worth investigating & profiling, cc @andrykonchin but either way this PR seems good to merge.

I tried cloning https://github.com/sass-contrib/sass-embedded-host-ruby and cd ext/sass and then time rake. That takes 3s on CRuby but 20s on truffleruby-dev (Native). git clean -Xf . between cleans the state. I tried time ruby --cpusampler -S rake but that gave:

Zlib::BufError: buffer error
exception.c:29:in `rb_exc_raise'
exception.c:107:in `rb_ensure'
mutex.c:39:in `rb_mutex_synchronize'
/home/eregon/code/sass-embedded-host-ruby/ext/sass/Rakefile:191:in `fetch'
/home/eregon/code/sass-embedded-host-ruby/ext/sass/Rakefile:89:in `block in <top (required)>'
/home/eregon/code/sass-embedded-host-ruby/ext/sass/Rakefile:8:in `block in <top (required)>'

which looks like an issue of interrupting C ext code to get a backtrace.

On a second try it worked though, and took about 20s:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Thread[#1,main,5,main]
 Name                                                                               ||             Total Time    ||              Self Time    || Location             
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash#[]=                                                                           ||             2250ms   5.8% ||             2140ms   5.5% || (core)~1:0
 Marshal::StringState#consume_byte                                                  ||             1930ms   5.0% ||             1930ms   5.0% || resource:/truffleruby/core/marshal.rb~1423-1428:38993-39169
 String#<=>                                                                         ||             1630ms   4.2% ||             1630ms   4.2% || resource:/truffleruby/core/string.rb~1354-1376:39630-40399
 Array#map                                                                          ||             3050ms   7.8% ||             1380ms   3.5% || (core)~1:0
 NameTuple#initialize                                                               ||             1150ms   3.0% ||             1140ms   2.9% || ../../../../.rubies/truffleruby-dev/lib/mri/rubygems/name_tuple.rb~8-17:170-395
 Array#<<                                                                           ||              700ms   1.8% ||              700ms   1.8% || (core)~1:0
 Truffle::RegexpOperations.match_in_region                                          ||              720ms   1.8% ||              680ms   1.7% || resource:/truffleruby/core/truffle/regexp_operations.rb~76-84:2324-2787
 BasicObject#__id__                                                                 ||              630ms   1.6% ||              630ms   1.6% || (core)~1:0
 Marshal::State#construct                                                           ||            10130ms  26.0% ||              580ms   1.5% || resource:/truffleruby/core/marshal.rb~640-741:18778-21700
 Marshal::State#construct_integer                                                   ||             1460ms   3.7% ||              470ms   1.2% || resource:/truffleruby/core/marshal.rb~884-931:24658-26179
 Thread#value                                                                       ||              460ms   1.2% ||              460ms   1.2% || (core)~1:0
 Module#const_defined?                                                              ||              440ms   1.1% ||              440ms   1.1% || resource:/truffleruby/core/module.rb~111-113:4154-4262
 Integer#times                                                                      ||            10060ms  25.8% ||              420ms   1.1% || resource:/truffleruby/core/integer.rb~144-153:4421-4565
 Array#bottom_up_merge                                                              ||             1720ms   4.4% ||              400ms   1.0% || resource:/truffleruby/core/array.rb~1462-1481:33779-34262
 Truffle::Interop.execute                                                           ||              290ms   0.7% ||              290ms   0.7% || (core)~1:0
 Marshal::State#add_object                                                          ||             2910ms   7.5% ||              210ms   0.5% || resource:/truffleruby/core/marshal.rb~617-621:18348-18454
 block in SpecFetcher#tuples_for                                                    ||              170ms   0.4% ||              170ms   0.4% || ../../../../.rubies/truffleruby-dev/lib/mri/rubygems/spec_fetcher.rb~256:6404-6420
 Truffle::Type.check_funcall                                                        ||              390ms   1.0% ||              170ms   0.4% || resource:/truffleruby/core/type.rb~289-306:9734-10246
 String#byteslice                                                                   ||              160ms   0.4% ||              160ms   0.4% || resource:/truffleruby/core/string.rb~44-71:2090-3073
 Array#map!                                                                         ||              170ms   0.4% ||              130ms   0.3% || (core)~1:0
 Truffle::Splitter.add_empty                                                        ||              130ms   0.3% ||              120ms   0.3% || resource:/truffleruby/core/splitter.rb~212-214:7206-7314
 Module#const_get                                                                   ||              120ms   0.3% ||              120ms   0.3% || resource:/truffleruby/core/module.rb~115-151:4267-5369
 Enumerable::SortedElement#initialize                                               ||              120ms   0.3% ||              120ms   0.3% || resource:/truffleruby/core/enumerable.rb~482-484:12795-12868
 block in Enumerable#sort_by                                                        ||              420ms   1.1% ||              120ms   0.3% || resource:/truffleruby/core/enumerable.rb~500-502:13200-13254

Looks like Marshal is taking a bunch of time here. But I'm not sure how much we can trust that profiling output (notably it will likely not account well for time spent in native code, and guest safepoint bias).

eregon commented 1 month ago

Here is a profile with async-profiler on JVM on 24.0.1: profile.zip (using ruby '--vm.agentpath:/home/eregon/Downloads/async-profiler-3.0-linux-x64/lib/libasyncProfiler.so=start,event=cpu,file=profile.html,include=*org/truffleruby*' --vm.XX:+UnlockDiagnosticVMOptions --vm.XX:+DebugNonSafepoints -S rake) Interestingly it also took about 20s seconds. I wonder if we have some weird network timeout or so. But from looking at the profile very quickly it looks like we spend a bunch of time in Ruby code and not much in native code, so maybe Marshal is indeed a bottleneck here, it'd be good to confirm with some manual timings.

ntkme commented 1 month ago

I took some time and found this line is where most of the time was spent, and why Marshal shows up in CPU profiling:

    specs_and_sources, _errors = Gem::SpecFetcher.fetcher.spec_for_dependency(dependency, false)

https://github.com/sass-contrib/sass-embedded-host-ruby/blob/6cd3c23933e2c76e07683c19e7157f5cc4e3c64a/ext/sass/Rakefile#L215

The context here is that, let's say user is installing platform ruby sass-embedded 1.77.0 on aarch64-linux-gnu, the Rakefile will download the rubygems index from rubygems.org and find 1.77.0-aarch64-linux-gnu precompiled gem from the index, then download the gem and extract the pre-built extension from it - this is to avoid rely on external command like tar, unzip, etc used in local development environment, and to make it work with whatever gem sources of users' choice (if we do third party download it may be blocked by some enterprise firewall).

eregon commented 1 month ago

Ah BTW I have been thinking it might worth adding an option to append custom gems to that list. And that could then be used via TRUFFLERUBYOPT. @andrykonchin Could you do that in a separate PR?

eregon commented 1 month ago

@ntkme Thank you for looking into it. Do you know why gem install sass-embedded together with your change doesn't occur this overhead? It still downloads sass-embedded-1.77.0-aarch64-linux-gnu.gem in the end. So I guess it's the part about figuring out the gem for the current platform which is really slow. Surprising as I'd expect there is little data to unmarshal. But OTOH marshaled gemspecs contain the list of files which can be big.

ntkme commented 1 month ago

Do you know why gem install sass-embedded together with your change doesn't occur this overhead? Surprising as I'd expect there is little data to unmarshal.

I checked this and it's due to gem install sass-embedded fetches index differently than Gem::SpecFetcher.fetcher.spec_for_dependency:

This is how gem install sass-embedded fetches the index:

HEAD https://index.rubygems.org/
200 OK
GET https://index.rubygems.org/info/sass-embedded
200 OK
GET https://index.rubygems.org/quick/Marshal.4.8/sass-embedded-1.77.0-arm64-darwin.gemspec.rz
200 OK
GET https://index.rubygems.org/info/google-protobuf
200 OK
GET https://index.rubygems.org/info/rake
200 OK
Downloading gem sass-embedded-1.77.0-arm64-darwin.gem
GET https://index.rubygems.org/gems/sass-embedded-1.77.0-arm64-darwin.gem
Fetching sass-embedded-1.77.0-arm64-darwin.gem
200 OK

This is how Gem::SpecFetcher.fetcher.spec_for_dependency fetches the index:

GET https://rubygems.org/specs.4.8.gz
304 Not Modified

It turns out spec_for_dependency is fetching the complete index, rather than a partial index, which explains why it's way slower. Let me see if I can figure out gem install works internally.

ntkme commented 1 month ago

I was able to bring it down to roughly 4s with https://github.com/sass-contrib/sass-embedded-host-ruby/commit/7f10921fe36fbb4ae74e434afa256988a3348b14

Still, this means installing native gem directly will take 4s, and install platform ruby will take roughly 2 x 4 = 8s.

eregon commented 1 month ago

Good catch and that explains it clearly, fetching all specs is quite expensive and something that IIRC shouldn't happen anymore with recent RubyGems & Bundler. It might be a RubyGems bug that Gem::SpecFetcher.fetcher.spec_for_dependency fetches the full index (but maybe that API makes it difficult to use index.rubygems.org?).

eregon commented 1 month ago

In hindsight I should probably have tried --cpusampler=flamegraph, that might have shown the full index is used (but I had little time to investigate this today).

In any case, we'll merge this PR, thank you for the investigation and PR!

ntkme commented 1 month ago

maybe that API makes it difficult to use index.rubygems.org?

Yes, the two APIs work differently:

As the documentation says:

# A Resolver::Specification contains a subset of the information
# contained in a Gem::Specification.  Only the information necessary for
# dependency resolution in the resolver is included.
andrykonchin commented 4 weeks ago

A new CLI option --reuse-precompiled-gems was added in 00dbcfa05b335ff61a4183ac157544ebed72b315