gunnarmorling / 1brc

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
https://www.morling.dev/blog/one-billion-row-challenge/
Apache License 2.0
6.14k stars 1.85k forks source link

Add linl33 v2 #678

Closed linl33 closed 8 months ago

linl33 commented 8 months ago

Check List:

linl33 commented 8 months ago

The 10k dataset takes ~40s on my laptop (6 cores, no HT) now. Still not fast but I don't think there's anything quick I can do to fix it now.

tivrfoa commented 8 months ago

It's faster for the standard key set, but takes unduly lon for 10K keys (see create_measurements_3.sh). Could you take a look, would be sad to see it with DNF on the 10K leaderboard (I am evaluating the top 25 entries against that key seet).

Hi @gunnarmorling !

One suggestion for when you run the 10k:

Because the 10k is what this challenge is really asking. But many solutions took advantage of the small dataset, and most cities are less than 16 bytes ...

The top 10 for the 10k might be really different. =)

gunnarmorling commented 8 months ago

One suggestion for when you run the 10k

It's pretty much doing that (see create_measurements_3.sh which creates that file).

Because the 10k is what this challenge is really asking. But many solutions took advantage of the small dataset, and most cities are less than 16 bytes ...

Yes, agreed. I had not anticipated this, and in hindsight I should have not exposed the specific key set. But by the time folks started to optimize for the 413 key set and I realized it, I felt it was too late for changing the evaluation key set, as it would have created lots of frustration. In the end, people did what they were asked to do, so that's totally fair :)

The top 10 for the 10k might be really different. =)

Yes, which is why there is that separate leaderboard for the 10K keyset which I update occassionally by running the top 25 or so entries from the regular leaderboard against that keyset. Now there could be entries which score lower on the standard leaderboard but would be in the top 25 for 10K, but I felt that's acceptable, given it's just a side eval. Learning a lot myself on how to run these kinds of things :)

The reason I am asking submitters of top entries like this one to also make sure they do reasonably well with 10K is to make sure that they pass this test, so it's another probe on correctness.

gunnarmorling commented 8 months ago

Thanks a lot for the update, @linl33. This looks good now. A bit faster on the standard key set and also comes in at ~19 sec for 10K with the correct results. Thx for participating in 1BRC and creating one of the top entries!

Benchmark 1: timeout -v 300 ./calculate_average_linl33.sh 2>&1
  Time (mean ± σ):      2.810 s ±  0.056 s    [User: 17.796 s, System: 0.807 s]
  Range (min … max):    2.670 s …  2.865 s    10 runs

Summary
  linl33: trimmed mean 2.8200656749800004, raw times 2.8377631459800003,2.85637806698,2.67035041898,2.8544735929800003,2.8047393789800004,2.86473951498,2.81122164398,2.8182784189800003,2.77623454898,2.80143660298

Leaderboard

| # | Result (m:s.ms) | Implementation     | JDK | Submitter     | Notes     |
|---|-----------------|--------------------|-----|---------------|-----------|
|   | 00:02.820 | [link](https://github.com/gunnarmorling/1brc/blob/main/src/main/java/dev/morling/onebrc/CalculateAverage_linl33.java)| 22.ea.32-open | [Li Lin](https://github.com/linl33) | uses Unsafe |
linl33 commented 8 months ago

@gunnarmorling Thank you for organizing this event!

gunnarmorling commented 7 months ago

Hey @linl33!

Congrats again on being in the Top 20 of the One Billion Row Challenge!

To celebrate this amazing achievement, I would like to send you a 1BRC t-shirt and coffee mug. To claim your prize, fill out this form by Feb 18. After submitting the form, please provide a comment with the random value you've specified in the form, so that I know it is you who submitted it.

All data entered will solely be used in relation to processing this shipment. Shipments can be sent to any country listed here or here (I'll use whichever one is cheaper for me to ship to your location). A big thank you to Decodable for sponsoring these prizes!

Thanks a lot for participating in 1BRC,

--Gunnar

linl33 commented 7 months ago

Hi @gunnarmorling,

My 2 random numbers are

12533 32231