jinyus / related_post_gen

Data Processing benchmark featuring Rust, Go, Swift, Zig, Julia etc.
MIT License
323 stars 99 forks source link

Problem:

Given a list of posts, compute the top 5 related posts for each post based on the number of shared tags.

Steps - Read the posts JSON file. - Iterate over the posts and populate a map containing: `tag -> List`, with the int representing the post index of each post with that tag. - Iterate over the posts and for each post: - Create a map: `PostIndex -> int` to track the number of shared tags - For each tag, Iterate over the posts that have that tag - For each post, increment the shared tag count in the map. - Sort the related posts by the number of shared tags. - Write the top 5 related posts for each post to a new JSON file.

Run Benchmark

./run.sh go | rust | python | all

# windows (powershell)
./run.ps1 go | rust | python | all
# OR
pwsh ./run.ps1 go | rust | python | all

# Docker (check the dockerfile for available variables)
docker build -t databench .
# OR
docker pull ghcr.io/jinyus/databench:latest
# THEN
docker run -e TEST_NAME=all -it --rm databench
Rules

No:

- FFI (including assembly inlining) - Unsafe code blocks - Custom benchmarking - Disabling runtime checks (bounds etc) - Specific hardware targeting - SIMD for single threaded solutions - Hardcoding number of posts - Lazy evaluation (Unless results are computed at runtime and timed) - Computation Caching

Must:

- Support up to 100,000 posts - Support UTF8 strings - Parse json at runtime - Support up to 100 tags - Represent tags as strings - Be production ready - Use less than 8GB of memory

Updated Results from github workflow (raw data)

VM Specs ( AWS c7a.xlarge-4vCPU-8GB-RAM-Ubuntu 22.04 )
Language Time (5k posts) 20k posts 60k posts Total
Julia HO[^1] 5.80 ms 20.00 ms 89.33 ms 115.13 ms
D (v2) $\textsf{\color{lightgreen}11.73 ms}$ $\textsf{\color{lightgreen}123.60 ms}$ $\textsf{\color{lightgreen}972.19 ms}$ 1.11 s
C++ 13.80 ms 170.67 ms 1.45 s 1.63 s
Rust 14.18 ms 193.88 ms 1.68 s 1.89 s
Zig 14.00 ms 195.67 ms 1.68 s 1.89 s
D 18.44 ms 245.60 ms 2.09 s 2.35 s
Go 16.87 ms 248.89 ms 2.15 s 2.42 s
Neat 19.21 ms 260.15 ms 2.14 s 2.42 s
Java (JIT) 25.60 ms 260.33 ms 2.20 s 2.48 s
C# (AOT) 18.81 ms 267.93 ms 2.35 s 2.63 s
C# (JIT) 19.10 ms 270.89 ms 2.34 s 2.63 s
Nim 19.39 ms 291.69 ms 2.52 s 2.84 s
F# (AOT) 24.78 ms 366.04 ms 3.23 s 3.62 s
Julia 27.37 ms 407.72 ms 3.62 s 4.05 s
Java (GraalVM) 28.10 ms 416.00 ms 3.63 s 4.07 s
Vlang 28.81 ms 429.55 ms 3.74 s 4.20 s
F# (JIT) 26.25 ms 410.77 ms 3.92 s 4.36 s
Swift 34.58 ms 496.38 ms 4.17 s 4.70 s
Haskell 36.30 ms 508.33 ms 4.42 s 4.96 s
Odin 34.53 ms 510.76 ms 4.47 s 5.02 s
Crystal 39.44 ms 582.76 ms 5.13 s 5.75 s
JS (Bun) 64.60 ms 763.33 ms 6.32 s 7.15 s
LuaJIT 63.53 ms 824.24 ms 6.72 s 7.61 s
Pypy 61.62 ms 748.57 ms 6.83 s 7.64 s
Dart VM 50.30 ms 969.67 ms 7.51 s 8.53 s
Common Lisp (SBCL) 130.00 ms 995.00 ms 8.37 s 9.49 s
Dart AOT 66.20 ms 1.02 s 8.97 s 10.05 s
JS (Deno) 92.30 ms 1.08 s 10.19 s 11.36 s
Ocaml 77.40 ms 1.22 s 10.14 s 11.43 s
JS (Node) 104.30 ms 1.32 s 12.39 s 13.81 s
Racket 114.77 ms 1.77 s 14.12 s 16.01 s
Typed Racket 117.18 ms 1.73 s 14.18 s 16.02 s
Lobster (C++) 123.76 ms 1.91 s 16.66 s 18.69 s
Scala Native 254.40 ms 3.14 s 25.70 s 29.10 s
LuaJIT (JIT OFF) 538.14 ms 8.47 s 66.93 s 75.94 s
Erlang 643.30 ms 10.46 s 94.95 s 106.06 s
Lua 827.92 ms 12.78 s 113.92 s 127.53 s
Python 1.20 s 18.93 s 169.05 s 189.18 s
Lobster (JIT) 1.35 s 20.78 s 195.33 s 217.46 s
Ruby 1.50 s 22.91 s 213.64 s 238.05 s
Inko[^2] 6.29 s 98.05 s 875.39 s 979.73 s
Numpy 216.51 ms OOM 3.27 s Infinity s

Multicore Results

Language Time (5k posts) 20k posts 60k posts Total
D Concurrent (v2) 5.83 ms $\textsf{\color{lightgreen}44.35 ms}$ $\textsf{\color{lightgreen}265.80 ms}$ 315.98 ms
C# Concurrent (JIT) 6.56 ms 47.71 ms 378.32 ms 432.59 ms
C++ Concurrent 5.00 ms 48.33 ms 390.00 ms 443.33 ms
C# Concurrent (AOT) 4.97 ms 56.80 ms 458.21 ms 519.98 ms
D Concurrent 7.41 ms 63.79 ms 465.82 ms 537.03 ms
Nim Concurrent $\textsf{\color{lightgreen}4.64 ms}$ 59.38 ms 497.93 ms 561.95 ms
Rust Concurrent 5.05 ms 67.47 ms 541.40 ms 613.93 ms
Go Concurrent 5.84 ms 77.61 ms 642.69 ms 726.14 ms
Julia Concurrent 5.89 ms 77.39 ms 660.25 ms 743.54 ms
F# Concurrent 7.50 ms 104.33 ms 904.33 ms 1.02 s
Swift Concurrent 10.86 ms 125.49 ms 1.01 s 1.15 s
F# Concurrent (AOT) 8.50 ms 128.33 ms 1.14 s 1.28 s
Java Concurrent (JIT) 66.60 ms 186.67 ms 1.16 s 1.41 s
Zig Concurrent 14.96 ms 190.59 ms 1.59 s 1.79 s
Java (GraalVM) Concurrent 12.40 ms 252.67 ms 1.84 s 2.11 s
Old Results with details (on my machine) | Language | Processing Time | Total (+ I/O) | Details | | ---------- | --------------- | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Rust | - | 4.5s | Initial | | Rust v2 | - | 2.60s | Replace std HashMap with fxHashMap by [phazer99](https://www.reddit.com/r/rust/comments/16plgok/comment/k1rtr4x/?utm_source=share&utm_medium=web2x&context=3) | | Rust v3 | - | 1.28s | Preallocate and reuse map and unstable sort by [vdrmn](https://www.reddit.com/r/rust/comments/16plgok/comment/k1rzo7g/?utm_source=share&utm_medium=web2x&context=3) and [Darksonn](https://www.reddit.com/r/rust/comments/16plgok/comment/k1rzwdx/?utm_source=share&utm_medium=web2x&context=3) | | Rust v4 | - | 0.13s | Use Post index as key instead of Pointer and Binary Heap by [RB5009](https://www.reddit.com/r/rust/comments/16plgok/comment/k1s5ea0/?utm_source=share&utm_medium=web2x&context=3) | | Rust v5 | 38ms | 52ms | Rm hashing from loop and use vec[count] instead of map[index]count by RB5009 | | Rust v6 | 23ms | 36ms | Optimized Binary Heap Ops by [scottlamb](https://github.com/jinyus/related_post_gen/pull/12) | | Rust Rayon | 9ms | 22ms | Parallelize by [masmullin2000](https://github.com/jinyus/related_post_gen/pull/4) | | Rust Rayon | 8ms | 22ms | Remove comparison out of hot loop | | ⠀ | ⠀ | ⠀ | ⠀ | | Go | - | 1.5s | Initial | | Go v2 | - | 80ms | Add rust optimizations | | Go v3 | 56ms | 70ms | Use goccy/go-json | | Go v3 | 34ms | 55ms | Use generic binaryheap by [DrBlury](https://github.com/jinyus/related_post_gen/pull/7) | | Go v4 | 26ms | 50ms | Replace binary heap with custom priority queue | | Go v5 | 20ms | 43ms | Remove comparison out of hot loop | | Go Con | 10ms | 33ms | Go concurrency by [tirprox](https://github.com/jinyus/related_post_gen/pull/17) and [DrBlury](https://github.com/jinyus/related_post_gen/pull/8) | | Go Con v2 | 5ms | 29ms | Use arena, use waitgroup, rm binheap by [DrBlury](https://github.com/jinyus/related_post_gen/pull/20) | | ⠀ | ⠀ | ⠀ | ⠀ | | Python | - | 7.81s | Initial | | Python v2 | 1.35s | 1.53s | Add rust optimizations by [dave-andersen](https://github.com/jinyus/related_post_gen/pull/10) | | Numpy | 0.57s | 0.85s | Numpy implementation by [Copper280z](https://github.com/jinyus/related_post_gen/pull/11) | | ⠀ | ⠀ | ⠀ | ⠀ | | Crystal | 50ms | 96ms | Inital w/ previous optimizations | | Crystal v2 | 33ms | 72ms | Replace binary heap with custom priority queue | | ⠀ | ⠀ | ⠀ | ⠀ | | Odin | 110ms | 397ms | Ported from golang code | | Odin v2 | 104ms | 404ms | Remove comparison out of hot loop | | ⠀ | ⠀ | ⠀ | ⠀ | | Dart VM | 125ms | 530ms | Ported from golang code | | Dart bin | 274ms | 360ms | Compiled executable | | ⠀ | ⠀ | ⠀ | ⠀ | | Vlang | 339ms | 560ms | Ported from golang code | | ⠀ | ⠀ | ⠀ | ⠀ | | Zig | 80ms | 110ms | Provided by [akhildevelops](https://github.com/jinyus/related_post_gen/pull/30) |

[^1]: Uses specialized datastructures meant for demonstration purposes: more [^2]: Inko is currently in beta and optimizations haven't been applied. more