oracle / truffleruby

A high performance implementation of the Ruby programming language, built on GraalVM.
https://www.graalvm.org/ruby/
Other
2.98k stars 180 forks source link

Excessive splitting with `Method#to_proc` #3527

Closed nirvdrum closed 2 months ago

nirvdrum commented 2 months ago

I'm trying to track down a production issue where the Array#<< method keeps splitting. There's no deoptimization. If I trace splitting, the log just fills up with something like the following (I removed a few interspersed splits from other methods, but most of these appear back-to-back).

Splitting Log ``` [engine] split 151279-0e0eb1c8-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151280-6e2f07bf-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151281-68d12f22-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151282-66c436b7-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151283-1d3f4b56-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151284-0bbba5b1-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151285-6af223db-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151286-2dc4d1bc-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151287-4a464aa9-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151288-1c86e87d-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151289-6791eccd-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151290-6cd398b2-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split failed Not enough budget. 4275836 > 4275833 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151291-14388483-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split failed Not enough budget. 4275844 > 4275841 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151292-3f565379-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split failed Not enough budget. 4275852 > 4275849 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151293-574927da-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split failed Not enough budget. 4275860 > 4275857 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151294-66f775cf-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split failed Not enough budget. 4275868 > 4275865 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151295-6602aece-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] [poly-event] Polymorphic event! Source: ArrayAppendOneNodeGen@4d341c64 at no source section Array#<< [engine] [poly-event] One caller! Analysing parent. Array#<< [engine] [poly-event] Early return: false callCount: 2, numberOfKnownCallNodes: 0 Array#<< [engine] [poly-event] Return: false Array#<< [engine] split 151296-47a67e7f-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151297-44dac5c2-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151298-3c16677b-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151299-5bcfbcc8-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151300-46d62217-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151301-54301bda-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151302-5b129398-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151303-7d9d5dce-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151304-0a2c0ef0-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151305-2db2173e-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151306-3619b2fc-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151307-494a8d66-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151308-3f1d040c-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151309-022bef36-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151310-79de33dd-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151311-64e2a03d-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151312-3308f5c9-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151313-6e07eec1-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151314-66cf6807-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151315-175dcfab-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151316-630fd634-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151317-32af5d72-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151318-2f3d8470-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151319-56e436ab-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 [engine] split 151320-4bc7e435-1 Array#<< |AST 10|Calls/Thres 19825/ 3|CallsAndLoop/Thres 19825/ 1000|SourceSection (core)~1:0 ```

I've added logging to RubyRootNode#cloneUninitialized and with that, I've tracked the source of the split to Truffle::Splitter.add_substring. In particular, Truffle::Splitter.split accepts a block to work with the split results, but if no block is provided, it returns an array of the split components. To avoid code duplication, it does the following if a block is not provided:

result = []
block = orig_block || result.method(:<<).to_proc

That's the source of the Array#<< splits. The string splitting code is quite large, so cutting that out we're left with:

def splits(&block)
  [].method(:<<).to_proc.call(1)
end

loop { splits }

If you run jt ruby --engine.TraceSplitting split_issue.rb, you'll see a consecutive run of splits for Array#<<:

[engine] split 505-52ccf197-1     Array#<<                                                    |AST   10|Tier 1|Calls/Thres       3/    1|CallsAndLoop/Thres       3/  400|SourceSection (core)~1:0
[engine] split 506-2b06681c-1     Array#<<                                                    |AST   10|Tier 1|Calls/Thres       3/    1|CallsAndLoop/Thres       3/  400|SourceSection (core)~1:0
[engine] split 507-20f6b6e0-1     Array#<<                                                    |AST   10|Tier 1|Calls/Thres       3/    1|CallsAndLoop/Thres       3/  400|SourceSection (core)~1:0
[engine] split 508-67cd84f9-1     Array#<<                                                    |AST   10|Tier 1|Calls/Thres       3/    1|CallsAndLoop/Thres       3/  400|SourceSection (core)~1:0
[engine] split 509-52b891de-1     Array#<<                                                    |AST   10|Tier 1|Calls/Thres       3/    1|CallsAndLoop/Thres       3/  400|SourceSection (core)~1:0
[engine] split 510-4ae6451d-1     Array#<<                                                    |AST   10|Tier 1|Calls/Thres       3/    1|CallsAndLoop/Thres       3/  400|SourceSection (core)~1:0
[engine] split 511-54530644-1     Array#<<                                                    |AST   10|Tier 1|Calls/Thres       3/    1|CallsAndLoop/Thres       3/  400|SourceSection (core)~1:0```

The splitting will stop once the dispatch cache limit in MethodNodes.ToProcNode#toProcCachedSingleContext is hit. In my production environment, however, the splitting never stops. I haven't yet determined why that is and my attempts at finding a small reproduction haven't produced the non-stop splitting. Unfortunately, due to security restrictions in the production environment I can only use jdb as the debugger and that's making for a very slow process.

The primary attribution stems back to a code loop that writes to a file once per second. It looks like:

tempfilename = File.join(Dir.tmpdir, "monitor_#{Process.pid}")
tempfile = File.open(tempfilename, "w", encoding: marshaled_stats.encoding)

The File.open call will call Truffle::IOOperations.normalize_options, which in turn calls String#split. The loop body is not dynamically declaring classes or methods, so I'm not sure why the splits won't converge.

We can work around this by changing the way Truffle::Splitter works, but I think it's worth looking at the larger issue.

eregon commented 2 months ago

Thank you for the report, that repro is enough to show there is unexpected splitting there, it should split at most once since there is a single caller of splits.

eregon commented 2 months ago

I investigated this. What we need is a global cache from InternalMethod to (Method#to_proc) CallTarget, like we already have for Symbol#to_proc. We should simply add a CallTarget field in InternalMethod for that. Once we have that we should have a single specialization in Method#to_proc and no more inline caching, that specialization just checking if the CallTarget field is non-null and creating a RubyProc from it, or creating the CallTarget and store it behind a TruffleBoundary.

(we could also force split + inline cache that, but the minimal performance gain doesn't seem worth the footprint cost)

eregon commented 2 months ago

Regarding Truffle::Splitter.split we should remove the .to_proc there because it seems unnecessary (and rename block to callable) and we should always_split it, as well as its caller String#split so calls to the block/callable are fast and not some IndirectCallNode because Truffle::Splitter.split is too big to split heuristically.

EDIT: @nirvdrum will do this

andrykonchin commented 2 months ago

Fixed in dc120e729c55bb93e4a9e4657cd22d3a61fd3325/#3529 and 95c8d7f5a7b7015cf8ef4042b418b8cb6a187aec/#3540