gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6.3k stars 1.88k forks source link

Reduce allocations and heap size #525

Closed roman-r-m closed 9 months ago

roman-r-m commented 9 months ago

Check List:

Minus another 100-150ms on my machine.

gunnarmorling commented 9 months ago

Hey, this one produces an incorrect output for the 10K keyset test (see create_measurements3.sh). While that's not the official challenge, I'd like to make sure that at least the top entries pass that, so as to make sure they don't cut any corners. This one runs out of heap space for that test, so it might bea simple fix. Could you take a look? Thx!

roman-r-m commented 9 months ago

Hmm, seems to be working for me. Just to confirm, it's measurements3.txt right? Maybe there's some difference in how I run it. Is there a script to do it or is it all manual?

roman-r-m commented 9 months ago

I don't want to just increase the heap size more than needed, because doing so makes the result of the small dataset worse. Not sure why.

gunnarmorling commented 9 months ago

Just to confirm, it's measurements3.txt right?

No, I was referring to a 1B rows file with 10K keys. You can create one via ./create_measurements3.sh 1000000000. This PR OOME's for that right now.

roman-r-m commented 9 months ago

No, I was referring to a 1B rows file with 10K keys. You can create one via ./create_measurements3.sh 1000000000. This PR OOME's for that right now.

I may be missing something but create_measurements3.sh calls CreateMeasurements3 which writes to measurements3.txt:

try (var out = new BufferedWriter(new FileWriter("measurements3.txt"))) {
roman-r-m commented 9 months ago

Anyway, I've increased the heap size, could you try again pls. I'll try to replicate it locally later today.

gunnarmorling commented 9 months ago

Ah, sorry, I thought you were referring to src/test/resources/samples/measurements-3.txt (which is just one of the test cases). So yes, that measurements3.txt file created by _createmeasurements3.sh was failing for me (after soft-linking it to measurements.txt).

Will run again as per your latest update.

gunnarmorling commented 9 months ago

00:04.551 now, also passing the 10K test (in ~20 sec).