gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
5.69k stars 1.72k forks source link

My first submission #697

Closed JurenIvan closed 5 months ago

JurenIvan commented 5 months ago

Check List:

JurenIvan commented 5 months ago

@gunnarmorling Hope you're having a great evening and that all is ready to be tested.

Thank you for creating such a magnificent challenge. My colleagues and I (and java community) learned a lot!

gunnarmorling commented 5 months ago

This produces incorrect output for the 10K keyset test (see create_measurements_3.sh):

+ timeout -v 300 ./test.sh JurenIvan measurements_10K_1B.txt
Validating calculate_average_JurenIvan.sh -- measurements_10K_1B.txt
87c87
< Alc;5.0;19.0;31.5
---
> Alc;-11.6;19.8;59.2
133c133
< Az ZubayrCaucaiaVitsyebskShinjukuNicolás Ro;6.1;14.8;29.6
---
> Az ZubayrCaucaiaVitsyebskShinjukuNicolás Ro;-15.4;14.4;43.3
294c294
< Chon;10.9;18.4;28.5
---
> Chon;-12.4;17.0;47.9
...
JurenIvan commented 5 months ago

Hi @gunnarmorling, I've realised where the bug is -> while doing hash collisions I've misused 2 variables (index and i) -.-. I have the fix for it. Can I commit it?

Thank you in advance!

gunnarmorling commented 5 months ago

Yes, you can. As we're after the cut-off time, you'll have two more changes you can make to this PR (see note at the top of the README). If it's not working or valid after that, I'll have to close it unfortunately. Updates should be pushed quickly, so I can evaluate all the pending entries. Thx!

gunnarmorling commented 5 months ago

Getting this exception now for the 10K keyset:

Benchmark 1: ./calculate_average_JurenIvan.sh
Exception in thread "main" java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
    at java.base/sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:1293)
    at dev.morling.onebrc.CalculateAverage_JurenIvan.processSegment(CalculateAverage_JurenIvan.java:54)
    at dev.morling.onebrc.CalculateAverage_JurenIvan.lambda$main$0(CalculateAverage_JurenIvan.java:43)
    at java.base/java.util.stream.IntPipeline$1$1.accept(IntPipeline.java:180)
    at java.base/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
    at java.base/java.util.Spliterator$OfInt.forEachRemaining(Spliterator.java:712)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:556)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:546)
    at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:960)
    at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:934)
    at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
    at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:759)
    at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
    at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:676)
    at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:927)
    at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:264)
    at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:702)
    at dev.morling.onebrc.CalculateAverage_JurenIvan.main(CalculateAverage_JurenIvan.java:45)

Note you can push once more to this PR before I'll either merge it (if valid) or have to close it. Thx!

gunnarmorling commented 5 months ago

Ok, this looks good now. 00:08.157 for the standard key set.