gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6.23k stars 1.87k forks source link

Adding solution for cb0s #575

Closed cb0s closed 9 months ago

cb0s commented 9 months ago

Check List:

Performance

On a AMD Ryzen 9 7950X with 64GB DDR5-CL28-5800MHz RAM and a PCIe Gen4 SSD, my algorithm completed in ~2.0s+/-0.1s on 32 threads. I compared it to thomas_wue's solution which runs in a little less than one third of the time that's why I hope I will be below 10 sec. on the testing machine without using anything like Unsafe or applying crazy bitmasks for less comparisons.

gunnarmorling commented 9 months ago

Hey @thomaswue, I am repeatedly running into this error when executing ths one for the 1B rows file:

Validating calculate_average_cb0s.sh -- measurements_1B.txt
Picking up existing native image 'target/CalculateAverage_cb0s_image', delete the file to select JVM mode.
Fatal error: Failed to leave the current IsolateThread context and to detach the current thread. (code 12)

Any idea what could be causing this? Couldn't find anything relevant so far.

gunnarmorling commented 9 months ago

So as stated above, this fails in native mode, it's not clear to me what's happening. It passes on the JVM in 00:10.284. Happy to add that to the leaderboard for the time being, I am not sure how to resolve the issue with the native GraalVM binary. Let me know if you'd like to have the JVM result added.

wirthi commented 9 months ago

Note: we are tracking that Fatal Error in GraalVM as ticket GR-51694 and try to reproduce it.

thomaswue commented 9 months ago

@cb0s This could be related to the process running out of memory before shutdown. I believe you are running with epsilon GC. Maybe there is too much allocation and a collecting GC would be required?

cb0s commented 9 months ago

Sorry for the late response. We currently have exams at uni...

I would love if you, @gunnarmorling , could add my JVM result to the leaderboard for the time being. I have to admit, it's the first time I am playing around with GraalVM.

cb0s commented 9 months ago

I did some more digging... Yes @thomaswue I am using the epsilon GC. When I used another GC it worked again. I believe, when I tested it, I must have been using a different GC mechanism. Funny enough, the pure Java solution is faster than the GraalVM solution. I added some more arguments making this solution a tiny bit faster. I will commit the changes now, and then this PR should be ready to merge @gunnarmorling . :)

cb0s commented 9 months ago

Some additional stuff I noticed while testing.

  1. CPU isn't fully utilized sometimes (I think the IO reading is a bottleneck and more direct read process (i.e. through unsafe or something) could improve the performance quite a bit.
  2. I think the more interesting part for those trying to reproduce the GraalVM error: for quicker and easier comparison and hashing, every time I parsed one line, I create a RawName record - a wrapper for the underlying byte[]-array. These however (obviously) are not cleaned up afterwards, that's probably why epsilon GC won't work. An idea (which I am not sure, if I can test before the end of this contest) would be to make the wrapper class obsolete and therefore having a smaller memory footprint.
gunnarmorling commented 9 months ago

All looking good now, 00:13.729 now with the JVM. Good luck for your exams :)