gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6.08k stars 1.83k forks source link

Thank you for the challange!! #599

Closed JaimePolidura closed 7 months ago

JaimePolidura commented 7 months ago

Check List:

My PC configuration: WSL2 & Windows Intel Core i7-1260P 2.1 GHz 32GB RAM 12 cores

JaimePolidura commented 7 months ago

I forgot to mention I'm using graal vm native image to run it

gunnarmorling commented 7 months ago

Please run mvn verify and append the changes made by the formatter to this PR.

JaimePolidura commented 7 months ago

Done

gunnarmorling commented 7 months ago

Please make the shell scripts executable. All these things are part of the PR template and you checked them as done. Sorry to say, but this creates a bit of a frustating experience :(

JaimePolidura commented 7 months ago

Sorry, my bad :(

I have made the shell scripts executable, I have also solved some rounding issues which I didn't know I have.

I think it should be ok now.

gunnarmorling commented 7 months ago

Hum, one of the tests is failing now on the eval machine (but interestingly not on CI):

+ timeout -v 300 ./test.sh JaimePolidura

Using java version 21.0.2-graal in this shell.
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt
4a5
> id10000;1.0;1.0;1.0
9998a10000
> id9999;1.0;1.0;1.0

FAILURE: ./test.sh JaimePolidura failed
Summary
  JaimePolidura: command failed or output did not match
JaimePolidura commented 7 months ago

I can't figure out what is casuing that problem. The test passess correctly on my machine:

jaime@linux:~/otro/1brc$ ./test.sh JaimePolidura Using java version 21.0.2-graal in this shell. Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-10.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-1.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-20.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-2.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-3.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-boundaries.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-complex-utf8.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-dot.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-rounding.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-shortest.txt Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-short.txt jaime@linux:~/otro/1brc$ ./test.sh JaimePolidura src/test/resources/samples/measurements-10000-unique-keys.txt

jaime@linux:~/otro/1brc$ ./test.sh JaimePolidura src/test/resources/samples/measurements-10000-unique-keys.txt Using java version 21.0.2-graal in this shell. Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt

Is it correct?

If the CI and the eval machine are executing the same program with the same test, it should produce the same output. Maybe the eval machine is using an outdated buggy version of my program?

I have changed the shell interpreter of the prepare_JaimePolidura.sh script to #!/bin/bash instead of #!/bin/sh I'm unsure if that is relevant to the problem.

gunnarmorling commented 7 months ago

Tests run on 32 cores on the eval machine, I suppose this messes up your chunking logic somehow. It passes when running only on 8 cores:

numactl --physcpubind=0-7 ./test.sh JaimePolidura

Using java version 21.0.2-graal in this shell.
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-10000-unique-keys.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-10.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-1.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-20.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-2.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-3.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-boundaries.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-complex-utf8.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-dot.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-rounding.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-shortest.txt
Validating calculate_average_JaimePolidura.sh -- src/test/resources/samples/measurements-short.txt
JaimePolidura commented 7 months ago

Yes you are right, I had a bug in the chunking logic.

I have committed the changes

gunnarmorling commented 7 months ago

Still fails on 32 cores. You should be able to verify locally by hard-coding 32 chunks rather than relying on processor count.

JaimePolidura commented 7 months ago

Sorry again :( I only tested it on 10k entries file. Now it should work

gunnarmorling commented 7 months ago

Ok, looking good now. 00:05.069.