losvedir / transit-lang-cmp

Programming language comparison by reimplementing the same transit data app
MIT License
426 stars 29 forks source link

Elixir: Make code more idiomatic and get x4 performance improvement #24

Open vegris opened 2 years ago

vegris commented 2 years ago

Hi, @losvedir!

First of all I'd like to thank you for making this benchmark. I really liked the task you've chosen, because it does not feel synthetic and made up, I can really believe this could be a real workload in a real world.

However I was not happy to see Elixir perform so poorly, so I've decided to give it a try and rewrite the code myself.

Spoiler: I was able to get ~x4 improvement to runtime speed and x2 improvement to load time (~x3 for ETS version, though final result does not use it).

It sort of incorporates the works of these PRs:

Though I've noticed it only when I've finished coding myself :)

I've benchmarked each step taken to give myself and others an intuition of what paths we are able to take optimizing Elixir code and what results can we expect.

Obviously, our workstations differ a bit, so the following numbers are only for relative comparison, not for absolute one. I've taken your Elixir and Rust (since this is the other toolchain I happen to have installed) versions as bases for comparison.

Here are the results:

$ elixir -v
Erlang/OTP 25 [erts-13.1] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [jit:ns]

Elixir 1.14.0 (compiled with Erlang/OTP 25)

$ rustc --version
rustc 1.63.0 (4b91a6ea7 2022-08-08)
Version Stop Times(ms) Trips(ms) JSON Heavy(requests/sec) JSON Small(requests/sec) RAM on JSON Heavy(Mb)
Rust 675 30 700 3700 515
Original Elixir 9200 685 76 480 1100
Plug.Cowboy same same 145 870 same
Streams 4500 415 same same same
Natural indexes 2700 180 180 1000 925
Persistent term 5500 150 340 1550 470

And below are some thoughts on each of the steps taken:

Replace Phoenix with Plug.Cowboy (5812dfc6d2b8e2a425f460e893867f556054927c)

This is an obvious one.

First, Phoenix generates a lot of stuff that is totally useful and helpful in a real project, but is just clutter for a benchmark (Mailer module for example). It gives an impression that Elixir is a verbose language (while in truth it's quite the opposite IMHO) and distracts a viewer's eye from the interesting parts of the code.

Second, while it's true that Phoenix is fast and highly optimized, it's still doing way more than just serve JSON - it collects telemetry, logs requests to dashboard, overrides HTTP methods, tries to fetch session - and it does all that for each request. A program that does more work no matter how optimized it is just can't compete with a program that does less :)

Use Stream to parse and load files (542900957482ee8c046016b4b7cecac376ce55ee)

It is generally true for every programming language that it's a good idea to apply streaming when processing big amounts of data. This way we can limit program's memory usage and also make sure we are only iterating the data once. This is especially important for Elixir because iterating a list in it is a bit pricier since it's a linked list and not a vector.

Use natural indexes and ETS duplicate_bag (f29ab228e02b1e90bb51ec043881bbb217f61114)

I didn't really get your motivation for introducing artificial indexes. You shouldn't replicate other languages' practices and conventions in any language and expect it to look and perform well.

I agree that the way you store your data should be dictated by the way you plan to access it, but it seems that the indexes already provided in the data are totally fine, so I've just used those.

Also the grouping of values can be handled by duplicate_bag ETS table type - notice how with it we are no longer fighting the ETS API and the code becomes clearer (and also faster - especially the loading part).

Replace ETS with persistent_term (5cb18b623c4342e1b81cbd18b210ceabd7e7ccf2)

I've took the ETS detour just to show the average performance one can expect when there's need to share frequently changing data between processes.

However when data that needs to be shared is static or updated infrequently, you really should be using persistent_term. It's designed exactly for this case and is the fastest way to access shared data.

Also it does not put any pressure on memory system because data is truly shared, not copied to the process's heap as in ETS case.

By storing data in persistent_term I was able to decrease RAM usage below Rust's values (though I think it just means that Rust version can be optimized further) and almost double the throughtput.

Conclusion

In the end I'm quite happy with the end result. By removing unnecessary stuff and better utilizing the tools the language and stdlib give us, I was able to show that Elixir is not that slow (quite the opposite in fact). And the best thing about it is that "more idiomatic" and "more performant" go hand in hand here :)