last perf papercut - Githubissues

gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

https://www.morling.dev/blog/one-billion-row-challenge/

Apache License 2.0

6.08k stars 1.83k forks source link

last perf papercut #613

Closed iziamos closed 7 months ago

iziamos commented 7 months ago

Check List:

[x] Tests pass (./test.sh <username> shows no differences between expected and actual outputs)
[x] All formatting changes by the build are committed
[x] Your launch script is named calculate_average_<username>.sh (make sure to match casing of your GH user name) and is executable
[x] Output matches that of calculate_average_baseline.sh
[x] For new entries, or after substantial changes: When implementing custom hash structures, please point to where you deal with hash collisions (line number)
Execution time: just shy of 1 second on all cores.
Execution time of reference implementation: 1:35.20

This was quite fun overall, learned alot. I'm just pushing this last tweak because I suspect its the lowest of the hanging fruit.

Farewell!

gunnarmorling commented 7 months ago

Doesn't make a significant difference on the eval env: 00:04.167. More importantly though, the entry doesn't finish for the 10K key set test (see create_measurements3.sh and results here). Could you take a look at this one?

iziamos commented 7 months ago

Hmm worth a look, I suspect its the hashcode I'd switched to, for context I wanted something where only the position in the string mattered to see if it would be trivially vectorisable. It turned out that vectors were to slow though :(

iziamos commented 7 months ago

@gunnarmorling I've updated the hashcode. Probably passes the 10k within a min (4 sec on my machine). But is probably slower overall :)

iziamos commented 7 months ago

Just did a squash, probably not worth merging if it performs worse on the eval box but just to demonstrate that its a hash collision issue on the 10k. :)

gunnarmorling commented 7 months ago

@iziamos, could you clarify: should this pass for 10K keys now, or is there still a potential hash collision? Correctness is the most important thing, so even if it is slower than what's on main right now but actually is more correct, I'd still merge it. Thx for clarifying.

iziamos commented 7 months ago

@gunnarmorling I believe it was always correct, this version is significantly faster in the 10k keys (4 sec vs sth like 4 min), but slower (on my machine) for the main challenge. I just made the change to demonstrate :)

In general I've found that the best way to evaluate solutions for hash collisions isn't to stress test the key count but to replace the hashcode implementation with one that returns a constant e.g. 1, and see that it eventually returns correct results.

gunnarmorling commented 7 months ago

Gotcha. So it's minimally slower for the standard key set (00:04.230), but indeed much faster for the 10K key set (00:06.712), so I'm gonna merge it, as this entry is amongst the top 20 or so for which I occassionally publish the 10K leaderboard and it DNF-ed there so far. Thx!