gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6.23k stars 1.87k forks source link

Solution without unsafe using vector API #602

Closed giovannicuccu closed 9 months ago

giovannicuccu commented 9 months ago

Check List:

Execution times recorded using Windows

giovannicuccu commented 9 months ago

Tha build failed but I think it's pipeline problem I rerun test.sh just now on my version and it works

gunnarmorling commented 9 months ago

Please make the shell scripts executable.

giovannicuccu commented 9 months ago

Please make the shell scripts executable.

I'm sorry, I'm using windows and when working under wsl my scripts are listed as executable:

gio@PC-Giovanni:./1-billion-row-challenge$ ls -la giovanni -rwxrwxrwx 1 gio gio 830 Jan 28 08:07 calculate_average_giovannicuccu.sh -rwxrwxrwx 1 gio gio 717 Jan 22 14:17 prepare_giovannicuccu.sh gio@PC-Giovanni:./1-billion-row-challenge$

I can successfully execute test.sh giovannicuccu

Do you have any suggestion? Am I missing something?

gunnarmorling commented 9 months ago

Hum, seems somehow the permissions get changed (see the note next to their file name). I can take care of it when merging. But this actually fails when running with the 10K key set (see create_measurements3.sh):

Caused by: java.lang.IllegalStateException: Segment is too large to wrap as ByteBuffer. Size: 2155501571
        at java.base/jdk.internal.foreign.AbstractMemorySegmentImpl.checkArraySize(AbstractMemorySegmentImpl.java:388)
        at java.base/jdk.internal.foreign.AbstractMemorySegmentImpl.asByteBuffer(AbstractMemorySegmentImpl.java:233)
        at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.computeListForPartition(CalculateAverage_giovannicuccu.java:323)
        at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.lambda$elaborate$1(CalculateAverage_giovannicuccu.java:275)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)

Can you take a look?

giovannicuccu commented 9 months ago

Hum, seems somehow the permissions get changed (see the note next to their file name). I can take care of it when merging. But this actually fails when running with the 10K key set (see create_measurements3.sh):

Caused by: java.lang.IllegalStateException: Segment is too large to wrap as ByteBuffer. Size: 2155501571
        at java.base/jdk.internal.foreign.AbstractMemorySegmentImpl.checkArraySize(AbstractMemorySegmentImpl.java:388)
        at java.base/jdk.internal.foreign.AbstractMemorySegmentImpl.asByteBuffer(AbstractMemorySegmentImpl.java:233)
        at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.computeListForPartition(CalculateAverage_giovannicuccu.java:323)
        at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.lambda$elaborate$1(CalculateAverage_giovannicuccu.java:275)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1583)

Can you take a look?

Sure, but on my pc it's working, maybe it's an OS issue, I'll dig into this one.

gio@PC-Giovanni:./1-billion-row-challenge$ ./test.sh giovannicuccu Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-1.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-10.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-2.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-20.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-3.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-boundaries.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-complex-utf8.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-dot.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-rounding.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-short.txt WARNING: Using incubator modules: jdk.incubator.vector Validating calculate_average_giovannicuccu.sh -- src/test/resources/samples/measurements-shortest.txt WARNING: Using incubator modules: jdk.incubator.vector gio@PC-Giovanni:./1-billion-row-challenge$

gunnarmorling commented 9 months ago

Sure, but on my pc it's working, maybe it's an OS issue, I'll dig into this one.

This is not part of the tests, you'll need to generate that file first via ./create_measurements3.sh 1000000000.

giovannicuccu commented 9 months ago

Sure, but on my pc it's working, maybe it's an OS issue, I'll dig into this one.

This is not part of the tests, you'll need to generate that file first via ./create_measurements3.sh 1000000000.

Ops, sorry I misunderstood your request. I pushed the fix the solution now is even faster than the previous one.

For the records: a HUGE thank you and kudos for this challenge; it's has been a very rewarding experience, I learnt a lot even from this pull request.

gunnarmorling commented 9 months ago

For the records: a HUGE thank you and kudos for this challenge; it's has been a very rewarding experience, I learnt a lot even from this pull request.

That's awesome, thanks a lot for that nice feedback!

Still getting an error for the 10K key set, unfortunately. This is what I run (note it does work without numactl, i.e. on 32 instead of 8 cores; something seems off with segmentation which shows up in this scenario):

numactl --physcpubind=0-7 hyperfine --show-output ./calculate_average_giovannicuccu.sh 2>&1
Benchmark 1: ./calculate_average_giovannicuccu.sh
WARNING: Using incubator modules: jdk.incubator.vector
java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index -2147483646 out of bounds for length 2155501540
Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index -2147483646 out of bounds for length 2155501540
    at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.reduce(CalculateAverage_giovannicuccu.java:302)
    at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.elaborate(CalculateAverage_giovannicuccu.java:280)
    at dev.morling.onebrc.CalculateAverage_giovannicuccu.main(CalculateAverage_giovannicuccu.java:464)
Caused by: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index -2147483646 out of bounds for length 2155501540
    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.reduce(CalculateAverage_giovannicuccu.java:290)
    ... 2 more
Caused by: java.lang.IndexOutOfBoundsException: Index -2147483646 out of bounds for length 2155501540
    at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
    at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:124)
    at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:448)
    at java.base/java.util.Objects.checkIndex(Objects.java:461)
    at jdk.incubator.vector/jdk.incubator.vector.VectorIntrinsics.checkFromIndexSize(VectorIntrinsics.java:71)
    at jdk.incubator.vector/jdk.incubator.vector.ByteVector.fromMemorySegment(ByteVector.java:3295)
    at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.computeListForPartition(CalculateAverage_giovannicuccu.java:329)
    at dev.morling.onebrc.CalculateAverage_giovannicuccu$MMapReaderMemorySegment.lambda$elaborate$1(CalculateAverage_giovannicuccu.java:275)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
gunnarmorling commented 9 months ago

Looking good now, 00:04.719. I'm gonna restore the file permissions.