Reduce allocations and heap size

gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

https://www.morling.dev/blog/one-billion-row-challenge/

Apache License 2.0

6.3k stars 1.88k forks source link

Reduce allocations and heap size #525

Closed roman-r-m closed 9 months ago

roman-r-m commented 9 months ago

Check List:

[x] Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
[X] All formatting changes by the build are committed
[x] Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
[x] Output matches that of calculate_average_baseline.sh
[x] For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number)

Minus another 100-150ms on my machine.

Execution time:
Execution time of reference implementation:

gunnarmorling commented 9 months ago

Hey, this one produces an incorrect output for the 10K keyset test (see create_measurements3.sh). While that's not the official challenge, I'd like to make sure that at least the top entries pass that, so as to make sure they don't cut any corners. This one runs out of heap space for that test, so it might bea simple fix. Could you take a look? Thx!

roman-r-m commented 9 months ago

Hmm, seems to be working for me. Just to confirm, it's measurements3.txt right? Maybe there's some difference in how I run it. Is there a script to do it or is it all manual?

roman-r-m commented 9 months ago

I don't want to just increase the heap size more than needed, because doing so makes the result of the small dataset worse. Not sure why.

gunnarmorling commented 9 months ago

Just to confirm, it's measurements3.txt right?

No, I was referring to a 1B rows file with 10K keys. You can create one via ./create_measurements3.sh 1000000000. This PR OOME's for that right now.

roman-r-m commented 9 months ago

No, I was referring to a 1B rows file with 10K keys. You can create one via ./create_measurements3.sh 1000000000. This PR OOME's for that right now.

I may be missing something but create_measurements3.sh calls CreateMeasurements3 which writes to measurements3.txt:

try (var out = new BufferedWriter(new FileWriter("measurements3.txt"))) {

roman-r-m commented 9 months ago

Anyway, I've increased the heap size, could you try again pls. I'll try to replicate it locally later today.

gunnarmorling commented 9 months ago

Ah, sorry, I thought you were referring to src/test/resources/samples/measurements-3.txt (which is just one of the test cases). So yes, that measurements3.txt file created by _createmeasurements3.sh was failing for me (after soft-linking it to measurements.txt).

Will run again as per your latest update.

gunnarmorling commented 9 months ago

00:04.551 now, also passing the 10K test (in ~20 sec).