tiredenzo commented 2 years ago

Dear TruffleRuby team,

While trying to load a large CSV file as a Hash, with TruffleRuby I encountered some issues:

The the CSV file weighs about 39G and has 700 millions of lines. The machine I run has 240G of RAM TurffleRuby verison is installed via RVM truffleruby 22.0.0.2, like ruby 3.0.2, GraalVM CE Native [x86_64-linux]

The script runs and loads up until line ~= 129M and RAM usage is 66 Gigs when it crashes with the following error Fatal error: java.lang.OutOfMemoryError: Could not allocate an aligned heap chunk Ful log here: https://gist.github.com/tiredenzo/088f754a716adfada26144ca3b1baff2

Issue is that RruffleRuby will not be able to use the whole machine's 240 GB RAM and will crash around 66GB load :

Please note that the same scripts will work with ruby-3, it will loads completely will take about 140/150 G of RAM with ruby-3 On a side note, I also tried loading "sharding" the data accross 10k Hashes to see if it worked better. the programm behavior was the same

To reproduce run the following

Faster (directly in RAM) ruby test_no_csv.rb

Slower (use disk to generate csv before loading it) ruby generate_test_csv.rb && ruby load_csv.rb

# test_no_csv.rb

hash = {}
value = (0...40).map { ('a'..'z').to_a[rand(26)] }.join

700_000_000.times do |i|
  puts i if i % 1_000_000 == 0

  key = 1_000_000 + i
  hash[key] = "#{value}#{key}"
end

# generate_test_csv.rb

output_file = File.open('test.csv', 'w')
value = (0...40).map { ('a'..'z').to_a[rand(26)] }.join

700_000_000.times do |i|
  key = 1_000_000 + i

  output_file.write("#{key},#{value}#{key}\n")
end

output_file.close

# load_csv.rb

hash = {}

File.foreach('test.csv').with_index do |line, line_num|
  puts line_num if line_num % 1_000_000 == 0

  line = line.strip
  id, path = line.split(',')
  id = id.to_i

  hash[id] = path
end

Thank you

eregon commented 2 years ago

I made a script to make it clearer what is going on, please see https://gist.github.com/eregon/93523d087c283ac88d13a75845e5785e

So TruffleRuby doesn't use the full RAM by default, because the underlying VM has a default max heap which is smaller than the full RAM. However, it is possible to make it use all RAM with e.g. --vm.Xmx240g.

Could you try with --vm.Xmx240g and also try both --native (the default) and --jvm? RVM doesn't support the --jvm node, so for that use ruby-build truffleruby+graalvm-22.0.0.2 ~/.rubies/truffleruby+graalvm-22.0.0.2. I think the --jvm mode is going to be significantly better here due to the better GC.

tiredenzo commented 2 years ago

Hi, thanks for your reply

I installed TruffleRuby on a clean setup with ruby-build as you asked. And added your script to mine, it is indeed better ;)

For --native with --vm.Xmx240g I did not see any difference


# runs until line 110M
.rubies/truffleruby+graalvm-22.0.0.2/bin/ruby --vm.Xmx240g --native test_no_csv.rb

heap: 40574 MB used, 43427 MB committed, ? max, RSS: 39452 MB, allocated: ? Fatal error: java.lang.OutOfMemoryError: Could not allocate an aligned heap chunk

https://gist.github.com/tiredenzo/9eb69eb064280f46a3a5639958940f9e


- For `--jvm` with `--vm.Xmx240g` sadly it did not work but it went much further. It was also blazingly fast compared to `--native`, comparable to a couple of times of the speed I reached in GoLang for this task.

.rubies/truffleruby+graalvm-22.0.0.2/bin/ruby --vm.Xmx240g --jvm test_no_csv.rb

runs until line 402M

heap: 119760 MB used, 192928 MB committed, 245760 MB max, RSS: 162246 MB, allocated: ?

truffleruby: an internal exception escaped out of the interpreter, please report it to https://github.com/oracle/truffleruby/issues.

-2147483604 (java.lang.NegativeArraySizeException) from org.truffleruby.core.hash.library.BucketsHashStore.resize(BucketsHashStore.java:128) from org.truffleruby.core.hash.library.BucketsHashStore.set(BucketsHashStore.java:264) from org.truffleruby.core.hash.library.BucketsHashStoreGen$HashStoreLibraryExports$Cached.set(BucketsHashStoreGen.java:141) from org.truffleruby.core.hash.HashNodes$SetIndexNode.set(HashNodes.java:214) from org.truffleruby.core.hash.HashNodesFactory$SetIndexNodeFactory$SetIndexNodeGen.execute(HashNodesFactory.java:911) from org.truffleruby.language.RubyCoreMethodRootNode.execute(RubyCoreMethodRootNode.java:48) from jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:655) test.rb:29:in []=' from test.rb:35:inblock in

' from core/integer.rb:148:in times' from test.rb:22:in

'



Here is my code for this test https://gist.github.com/tiredenzo/cd97f74a054d5a1c4b1b99473318920b

I kept my allocation method, this might not have been a good choice for you, I can re-run with yours if you wish

aardvark179 commented 2 years ago

This looks like a bug in our hash implementation when we try to allocate very large buckets. We have an array of capacities, and then allocate an array of that capacity multiplied by a constant, and for the largest defined capacity this overflows the 32 bit array offset.

eregon commented 2 years ago

I will try to reproduce on native (on a machine with enough RAM), that early OutOfMemoryError sounds like a bug.

On JVM that NegativeArraySizeException we see is because of trying to create a very big Hash. That error could be improved to be clearer. An array in Java has a (signed) int size, and so the maximum is 2**31 - 1 = 2 147 483 647 elements

The Hash would have 700_000_000 pairs at the end, the max load factor is 0.75 (otherwise there is too much chaining). So it'd need at strict minimum 700_000_000 / 0.75 = 933 333 333 elements. That still fits in a Java array.

The current logic on resize, based on CRuby IIRC, is to pick the next capacity (1 073 741 909 here), and multiply it by 4 (OVERALLOCATE_FACTOR) so we don't have to resize too soon again. 1 073 741 909 * 4 = 4 294 967 636 which is close to 2**32 and bigger than the max Java array size. But we could probably use a smaller OVERALLOCATE_FACTOR in that case, i.e., use an array size close to the max array size. As long as the load factor would still be < 0.75 it seems fine. If it then needs to resize up again, then we'd fail with an error.

I kept my allocation method, this might not have been a good choice for you, I can re-run with yours if you wish

Could you try with my script as well? We should see on JVM it should be able to use all the RAM. On Native it's probably the same error but worth a try still.

eregon commented 2 years ago

I got access to a machine with 251GB of RAM. I used the latest truffleruby. On my script:

$ time graalvm-ce-java11-22.2.0-dev/bin/ruby --vm.Xmx240g --vm.XX:+PrintGC use_memory.rb 
...
heap: 238602 MB used, 238602 MB committed, ? max, RSS: 238172 MB, allocated: 237568 MB
heap: 239626 MB used, 239626 MB committed, ? max, RSS: 239196 MB, allocated: 238592 MB
[Full GC (CollectOnAllocation) 244328451K->244327427K, 0.0276189 secs]
[Full GC (CollectOnAllocation) 245376003K->245376003K, 0.0199192 secs]
heap: 240650 MB used, 240650 MB committed, ? max, RSS: 240220 MB, allocated: 239616 MB
heap: 241674 MB used, 241674 MB committed, ? max, RSS: 241244 MB, allocated: 240640 MB
[Full GC (CollectOnAllocation) 246425603K->246424579K, 0.0263307 secs]
[Full GC (CollectOnAllocation) 247473155K->247473155K, 0.3591485 secs]
heap: 242698 MB used, 242698 MB committed, ? max, RSS: 242196 MB, allocated: 241664 MB
Killed
real    10m37.484s

So it does reach 240GB before being killed. It takes a while though. So maybe the issue that --native didn't use all available memory was resolved already?

Using your test_no_csv.rb script:

$ time graalvm-ce-java11-22.2.0-dev/bin/ruby --vm.Xmx240g --vm.XX:+PrintGC test_no_csv.rb
...
heap: 114185 MB used, 118051 MB committed, ? max, RSS: 123257 MB, allocated: ?
[Full GC (CollectOnAllocation) 121923584K->121920512K, 637.3958737 secs]
395000000
heap: 127044 MB used, 140611 MB committed, ? max, RSS: 133253 MB, allocated: ?
396000000
heap: 127335 MB used, 140611 MB committed, ? max, RSS: 133253 MB, allocated: ?
397000000
heap: 127626 MB used, 140611 MB committed, ? max, RSS: 133253 MB, allocated: ?
398000000
heap: 127917 MB used, 140611 MB committed, ? max, RSS: 133253 MB, allocated: ?
399000000
heap: 128209 MB used, 140611 MB committed, ? max, RSS: 133253 MB, allocated: ?
400000000
heap: 128500 MB used, 140611 MB committed, ? max, RSS: 133253 MB, allocated: ?
401000000
heap: 128791 MB used, 140611 MB committed, ? max, RSS: 133253 MB, allocated: ?
402000000
heap: 129082 MB used, 140611 MB committed, ? max, RSS: 133253 MB, allocated: ?

-2147483604 (java.lang.NegativeArraySizeException)
    from org.truffleruby.core.hash.library.BucketsHashStore.resize(BucketsHashStore.java:128)
    from org.truffleruby.core.hash.library.BucketsHashStore.set(BucketsHashStore.java:264)
    from org.truffleruby.core.hash.library.BucketsHashStoreGen$HashStoreLibraryExports$Cached.set(BucketsHashStoreGen.java:141)
    from org.truffleruby.core.hash.HashNodes$SetIndexNode.set(HashNodes.java:238)
    from org.truffleruby.core.hash.HashNodesFactory$SetIndexNodeFactory$SetIndexNodeGen.execute(HashNodesFactory.java:981)
    from org.truffleruby.language.RubyCoreMethodRootNode.execute(RubyCoreMethodRootNode.java:48)
    from org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:659)
test_no_csv.rb:29:in `[]='
    from test_no_csv.rb:29:in `block in <main>'
    from <internal:core> core/integer.rb:148:in `times'
    from test_no_csv.rb:22:in `<main>'

real    203m12.921s
user    104m1.443s
sys 97m23.765s

The Native Image CE GC is not good enough for this. It would likely work quite a bit better with Native Image EE's G1 GC.

JVM CE:

$ time graalvm-ce-java11-22.2.0-dev/bin/ruby --jvm --vm.Xmx240g --vm.XX:+PrintGC use_memory.rb 
...
heap: 237651 MB used, 240736 MB committed, 245760 MB max, RSS: 240079 MB, allocated: 230400 MB
[558.105s][info   ][gc] GC(432) Pause Young (Concurrent Start) (G1 Humongous Allocation) 237683M->237651M(240736M) 11.251ms
[558.105s][info   ][gc] GC(433) Concurrent Cycle
[558.596s][info   ][gc] GC(433) Pause Remark 238707M->238707M(241792M) 3.945ms
[560.520s][info   ][gc] GC(433) Pause Cleanup 238739M->238739M(241792M) 1.576ms
[560.705s][info   ][gc] GC(433) Concurrent Cycle 2599.997ms
heap: 238707 MB used, 241792 MB committed, 245760 MB max, RSS: 241139 MB, allocated: 231424 MB
[562.767s][info   ][gc] GC(434) Pause Young (Concurrent Start) (G1 Humongous Allocation) 238739M->238707M(241792M) 11.377ms
[562.767s][info   ][gc] GC(435) Concurrent Cycle
[563.795s][info   ][gc] GC(435) Pause Remark 239763M->239763M(242848M) 15.780ms
[565.747s][info   ][gc] GC(435) Pause Cleanup 239795M->239795M(242848M) 2.219ms
[566.210s][info   ][gc] GC(435) Concurrent Cycle 3442.631ms
heap: 239763 MB used, 242848 MB committed, 245760 MB max, RSS: 242176 MB, allocated: 232448 MB
[568.430s][info   ][gc] GC(436) Pause Young (Concurrent Start) (G1 Humongous Allocation) 239795M->239763M(242848M) 24.967ms
[568.430s][info   ][gc] GC(437) Concurrent Cycle
Killed
real    10m26.169s
user    6m50.722s
sys 22m17.315s

$ time graalvm-ce-java11-22.2.0-dev/bin/ruby --jvm --vm.Xmx240g --vm.XX:+PrintGC test_no_csv.rb 
...
399000000
heap: 123664 MB used, 160800 MB committed, 245760 MB max, RSS: 151366 MB, allocated: ?
[994.426s][info   ][gc] GC(49) Pause Young (Concurrent Start) (G1 Evacuation Pause) 123856M->124448M(160800M) 4334.572ms
[994.426s][info   ][gc] GC(50) Concurrent Cycle
400000000
heap: 124544 MB used, 160800 MB committed, 245760 MB max, RSS: 158411 MB, allocated: ?
401000000
heap: 124832 MB used, 160800 MB committed, 245760 MB max, RSS: 158411 MB, allocated: ?
402000000
heap: 125120 MB used, 160800 MB committed, 245760 MB max, RSS: 158411 MB, allocated: ?

-2147483604 (java.lang.NegativeArraySizeException)
    from org.truffleruby.core.hash.library.BucketsHashStore.resize(BucketsHashStore.java:128)
    from org.truffleruby.core.hash.library.BucketsHashStore.set(BucketsHashStore.java:264)
    from org.truffleruby.core.hash.library.BucketsHashStoreGen$HashStoreLibraryExports$Cached.set(BucketsHashStoreGen.java:141)
    from org.truffleruby.core.hash.HashNodes$SetIndexNode.set(HashNodes.java:238)
    from org.truffleruby.core.hash.HashNodesFactory$SetIndexNodeFactory$SetIndexNodeGen.execute(HashNodesFactory.java:981)
    from org.truffleruby.language.RubyCoreMethodRootNode.execute(RubyCoreMethodRootNode.java:48)
    from jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:659)
test_no_csv.rb:29:in `[]='
    from test_no_csv.rb:29:in `block in <main>'
    from <internal:core> core/integer.rb:148:in `times'
    from test_no_csv.rb:22:in `<main>'

real    17m45.616s
user    91m7.772s
sys 44m4.004s

eregon commented 2 years ago

I've run on JVM too, and test_no_csv.rb on --native. There was no problem on native, it's the same result as on --jvm. Both end up with the NegativeArraySizeException, so I'll look into that next.

eregon commented 2 years ago

I've got a fix for the NegativeArraySizeException, and I could run a slight variant of the script to the end (https://gist.github.com/eregon/70787386138f339b964fb76496a3f1d9 it uses a little bit less memory):

$ ruby --jvm --vm.Xmx240g --vm.XX:+PrintGC test_no_csv.rb |& tee out.log
...
697
heap: 78752 MB used, 125472 MB committed, 245760 MB max, RSS: 105461 MB, allocated: ?
698
heap: 78848 MB used, 125472 MB committed, 245760 MB max, RSS: 105461 MB, allocated: ?
699
heap: 78912 MB used, 125472 MB committed, 245760 MB max, RSS: 105462 MB, allocated: ?

tiredenzo commented 2 years ago

thanks 👍

eregon commented 2 years ago

So it should work fine now. I didn't compare to CRuby (I don't have a recent CRuby on that machine).

If you want to optimize this workload to make it faster, it's probably best to avoid such a large memory usage/heap (because the GC becomes a lot slower, and growing the Hash takes a lot of time), and process e.g. line-by-line if that's possible. Also an Array would store the entries much more compactly (and much better use of the cache) than a Hash, and might still allow to search entries fairly quickly via their id using bsearch, assuming the ids are in increasing order in the .csv file.

oracle / truffleruby

Truffle crashes when trying to load large set in RAM #2635

runs until line 402M