gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6k stars 1.8k forks source link

final version for abeobk #654

Closed abeobk closed 6 months ago

abeobk commented 7 months ago

Check List:

This is my final version i guess, it does run faster than position 1 on my machine.

Benchmark 1: ./calculate_average_abeobk.sh Time (mean ± σ): 1.273 s ± 0.006 s [User: 0.003 s, System: 0.001 s] Range (min … max): 1.266 s … 1.285 s 10 runs

Benchmark 1: ./calculate_average_thomaswue.sh Time (mean ± σ): 1.313 s ± 0.006 s [User: 0.003 s, System: 0.001 s] Range (min … max): 1.306 s … 1.323 s 10 runs

thomaswue commented 7 months ago

How can you avoid the word0 comparison on line 266? Isn't xoring word0 and tail losing information making the equality check imprecise?

abeobk commented 7 months ago

Thanks for the advice. It is indeed a stupid mistake by me. I was fooled by the collision check stats. I fixed the code and reorganize it a little bit. (Line 470)

I realized that you can still use the min/max trick if you provide the value when adding a new entry. The cost of calculating (at most) (10k x CPU_CNT) values is much lower than 5*10^8 IF clauses :)

FYI: Here is the benchmark on my machine

Benchmark 1: ./calculate_average_abeobk.sh Time (mean ± σ): 1.290 s ± 0.005 s [User: 0.002 s, System: 0.001 s] Range (min … max): 1.280 s … 1.297 s 10 runs

Benchmark 1: ./calculate_average_thomaswue.sh Time (mean ± σ): 1.313 s ± 0.009 s [User: 0.002 s, System: 0.001 s] Range (min … max): 1.301 s … 1.327 s 10 runs

thomaswue commented 7 months ago

OK.

Making the max branch dependent on the min branch not taken shouldn't make too much of a difference, because given the randomness of the data set both of those branches shouldn't be taken in most cases. Did you measure the +- of changing those branches on your machine? I tried and cannot measure a difference. What is your hardware?

abeobk commented 7 months ago

For some reason, there is a + 200ms when not applying the trick on my machine. My machine is i9-10980XE, 128G RAM.

Use default min=999,max=-999 Benchmark 1: ./calculate_average_abeobk.sh Time (mean ± σ): 1.465 s ± 0.007 s [User: 0.003 s, System: 0.002 s] Range (min … max): 1.456 s … 1.480 s 10 runs

Interestingly, when providing initial value and using normal min/max routine, the result does not change Benchmark 1: ./calculate_average_abeobk.sh Time (mean ± σ): 1.283 s ± 0.008 s [User: 0.002 s, System: 0.001 s] Range (min … max): 1.271 s … 1.296 s 10 runs

My guess is that using the default initial min/max values requires 2 load ops, while setting min/max to the same initial value requires only 1, therefore cheaper. Doesnt make any sense. Any clue?

gunnarmorling commented 6 months ago

No significant difference:

Benchmark 1: timeout -v 300 ./calculate_average_abeobk.sh 2>&1
  Time (mean ± σ):      1.992 s ±  0.012 s    [User: 0.001 s, System: 0.004 s]
  Range (min … max):    1.959 s …  2.003 s    10 runs

Summary
  abeobk: trimmed mean 1.994525658115, raw times 1.95861125874,1.99812547674,2.0003377167400003,1.9938351567400001,1.99704247674,1.98802059974,1.9937148837399998,1.99169905374,2.0034787667400002,1.99342990074

Leaderboard

| # | Result (m:s.ms) | Implementation     | JDK | Submitter     | Notes     |
|---|-----------------|--------------------|-----|---------------|-----------|
|   | 00:01.994 | [link](https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_abeobk.java)| 21.0.2-graal | [Van Phu DO](https://github.com/abeobk) | GraalVM native binary, uses Unsafe |
abeobk commented 6 months ago

Can you eval again with the latest commit? I've got promising result.

The idea here is that the 64bit hash value already contains tail info, therefore you can cut the tail

Benchmark 1: ./calculate_average_abeobk.sh Time (mean ± σ): 1.270 s ± 0.015 s [User: 0.003 s, System: 0.002 s] Range (min … max): 1.250 s … 1.296 s 10 runs

gunnarmorling commented 6 months ago

Yepp:

Benchmark 1: timeout -v 300 ./calculate_average_abeobk.sh 2>&1 Time (mean ± σ): 1.918 s ± 0.012 s [User: 0.001 s, System: 0.004 s] Range (min … max): 1.885 s … 1.927 s 10 runs

Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Summary abeobk: trimmed mean 1.92100479045, raw times 1.8854614717,1.9214880407,1.9223210067,1.9173817947,1.9180732057,1.9212254107,1.9272654077000002,1.9273548517,1.9197098707000002,1.9205735867

Leaderboard

# Result (m:s.ms) Implementation JDK Submitter Notes
00:01.921 link 21.0.2-graal Van Phu DO GraalVM native binary, uses Unsafe
gunnarmorling commented 6 months ago

Hey @abeobk!

Congrats again on being in the Top 20 of the One Billion Row Challenge!

To celebrate this amazing achievement, I would like to send you a 1BRC t-shirt and coffee mug. To claim your prize, fill out this form by Feb 18. After submitting the form, please provide a comment with the random value you've specified in the form, so that I know it is you who submitted it.

All data entered will solely be used in relation to processing this shipment. Shipments can be sent to any country listed here or here (I'll use whichever one is cheaper for me to ship to your location). A big thank you to Decodable for sponsoring these prizes!

Thanks a lot for participating in 1BRC,

--Gunnar

abeobk commented 6 months ago

Thank you for organizing the challenge, and the gifts. My seed is 20549 31840 Please send S-size T shirt