gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6k stars 1.8k forks source link

SIMD parsing newlines, integer parsing, custom hashtable with SIMD lookup table for equality #663

Closed ChrisBellew closed 6 months ago

ChrisBellew commented 7 months ago

Thanks for this amazing competition!

Here's my implementation. I must have sunk 100+ hours into background benchmarking of this. I learned a lot! I purposely didn't look at any other submissions so I've no idea how similar they are. Looking forward to learning from the crazy fast submissions from others.

Check List:

gunnarmorling commented 6 months ago

Please run the formatter and amend the PR with the changes.

ChrisBellew commented 6 months ago

Formatting committed, thanks @gunnarmorling!

I ran ./mvnw clean verify and committed the result.

gunnarmorling commented 6 months ago

This fails on the 10K keyset test (see create_measurements_3.sh) with 1B rows. Seeing these:

Exception in thread "Thread-8" java.lang.IndexOutOfBoundsException: Index 262139 out of bounds for length 262137
    at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
    at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
    at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
    at java.base/java.util.Objects.checkIndex(Objects.java:385)
    at jdk.incubator.vector/jdk.incubator.vector.VectorIntrinsics.checkFromIndexSize(VectorIntrinsics.java:61)
    at jdk.incubator.vector/jdk.incubator.vector.ByteVector.fromArray(ByteVector.java:2975)
    at dev.morling.onebrc.CalculateAverage_chrisbellew$ThreadProcessor.processBuffer(CalculateAverage_chrisbellew.java:339)
    at dev.morling.onebrc.CalculateAverage_chrisbellew$ThreadProcessor.processRange(CalculateAverage_chrisbellew.java:303)
    at dev.morling.onebrc.CalculateAverage_chrisbellew$ThreadProcessor.run(CalculateAverage_chrisbellew.java:279)
ChrisBellew commented 6 months ago

Fixed the issue, thanks @gunnarmorling. :)

gunnarmorling commented 6 months ago

This produces incorrect results for the 10K keyset test (see create_measurements_3.sh). Note that we're after the cut-off time, you'll have two more changes you can make to this PR (see note at the top of the README). If it's not working or valid then, I'll have to close it unfortunately. Updates should be pushed quickly, so I can evaluate all the pending entries. Thx!

+ timeout -v 300 ./test.sh ChrisBellew measurements_10K_1B.txt
Validating calculate_average_ChrisBellew.sh -- measurements_10K_1B.txt
WARNING: Using incubator modules: jdk.incubator.vector
48c48
< -so;-15.7;15.1;45.6
---
> -so;-15.7;14.8;45.6
50c50
< -su;-25.0;7.0;46.0
---
> -su;-25.0;5.6;35.0
94c94
< Alot;-13.9;16.6;48.5
---
> Alot;-12.1;17.1;48.5
...
ChrisBellew commented 6 months ago

Thanks for your patience @gunnarmorling. I wasn't making good enough use of the test scripts with the different files.

I located the issue and fixed it. We should be good to go now!

gunnarmorling commented 6 months ago

Looking good now: 00:06.982.