Closed InoMurko closed 6 years ago
Running benchee a couple of times it gives me somewhat consistently better results (configuration branch https://github.com/InoMurko/ex_rlp/commit/82dfda9ca6cc5c38f320e6afc5f6fbdf713eda9f):
ex_rlp opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: none specified
Estimated total run time: 14 s
Benchmarking Ino...
Benchmarking master...
Name ips average deviation median 99th %
Ino 3.68 271.64 ms ±27.02% 243.74 ms 462.17 ms
master 2.14 468.35 ms ±12.89% 449.40 ms 627.16 ms
Comparison:
Ino 3.68
master 2.14 - 1.72x slower
ex_rlp opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: none specified
Estimated total run time: 14 s
Benchmarking Ino...
Benchmarking master...
Name ips average deviation median 99th %
Ino 4.57 218.78 ms ±5.00% 223.76 ms 239.24 ms
master 2.02 495.34 ms ±13.30% 519.62 ms 598.91 ms
Comparison:
Ino 4.57
master 2.02 - 2.26x slower
ex_rlp opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 28 s
Benchmarking Ino...
Benchmarking master...
Name ips average deviation median 99th %
Ino 4.26 234.73 ms ±9.57% 229.55 ms 291.80 ms
master 2.39 418.85 ms ±13.28% 393.64 ms 583.66 ms
Comparison:
Ino 4.26
master 2.39 - 1.78x slower
Memory usage statistics:
Name Memory usage
Ino 84.08 MB
master 125.56 MB - 1.49x memory usage
**All measurements for memory usage were the same**
ex_rlp opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 20 s
memory time: 5 s
parallel: 1
inputs: none specified
Estimated total run time: 54 s
Benchmarking Ino...
Benchmarking master...
Name ips average deviation median 99th %
Ino 4.63 216.12 ms ±5.20% 217.48 ms 258.35 ms
master 2.70 369.85 ms ±3.13% 365.91 ms 397.19 ms
Comparison:
Ino 4.63
master 2.70 - 1.71x slower
Memory usage statistics:
Name Memory usage
Ino 84.08 MB
master 125.56 MB - 1.49x memory usage
**All measurements for memory usage were the same**
ex_rlp opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 20 s
memory time: 5 s
parallel: 1
inputs: none specified
Estimated total run time: 54 s
Benchmarking Ino...
Benchmarking master...
Name ips average deviation median 99th %
Ino 4.12 242.50 ms ±20.58% 229.01 ms 460.16 ms
master 2.38 420.67 ms ±14.46% 399.50 ms 623.68 ms
Comparison:
Ino 4.12
master 2.38 - 1.73x slower
Memory usage statistics:
Name Memory usage
Ino 84.08 MB
master 125.56 MB - 1.49x memory usage
**All measurements for memory usage were the same**
I also tried shortening the amount of methods by using a range bit pattern as in original:
defmodule ExRLP.DecodeItemShort do
@moduledoc """
Captures bins and decodes them.
"""
@spec decode_item(binary(), ExRLP.t()) :: ExRLP.t()
def decode_item(rlp_binary), do: do_decode_item(rlp_binary, nil)
def decode_item(rlp_binary, result), do: do_decode_item(rlp_binary, result)
defp do_decode_item(<<>>, result) when is_list(result) do
Enum.reverse(result)
end
defp do_decode_item(<<>>, result), do: result
##
## HANDLING 0 - 127
##
defp do_decode_item(<<prefix, tail::binary>>, nil) when prefix < 128 do
do_decode_item(tail, <<prefix>>)
end
defp do_decode_item(<<prefix, tail::binary>>, result) when prefix < 128 do
do_decode_item(tail, [<<prefix>> | result])
end
##
## FINISHED HANDLING 0 - 127
##
##
## HANDLING 128 - 183
##
defp do_decode_item(<<prefix, tail::binary>>, nil) when prefix <= 183 do
item_length = prefix - 128
<<item::binary-size(item_length), new_tail::binary>> = tail
do_decode_item(new_tail, item)
end
defp do_decode_item(<<prefix, tail::binary>>, result) when prefix <= 183 do
item_length = prefix - 128
<<item::binary-size(item_length), new_tail::binary>> = tail
do_decode_item(new_tail, [item | result])
end
##
## FINISHED HANDLING 128-183
##
# decode_long_binary - CAN'T OPTIMISE FOR NOW
defp do_decode_item(<<be_size_prefix, tail::binary>>, nil) when be_size_prefix < 192 do
{new_item, new_tail} = decode_long_binary(be_size_prefix - 183, tail)
do_decode_item(new_tail, new_item)
end
defp do_decode_item(<<be_size_prefix, tail::binary>>, result) when be_size_prefix < 192 do
{new_item, new_tail} = decode_long_binary(be_size_prefix - 183, tail)
do_decode_item(new_tail, [new_item | result])
end
##
## HANDLING 192
##
defp do_decode_item(<<192, tail::binary>>, nil) do
do_decode_item(tail, [])
end
defp do_decode_item(<<192, tail::binary>>, result) do
do_decode_item(tail, [[] | result])
end
##
## FINISHED HANDLING 192
##
##
## HANDLING 193-247
##
defp do_decode_item(<<prefix, tail::binary>>, nil) when prefix <= 247 do
item_length = prefix - 192
<<item::binary-size(item_length), new_tail::binary>> = tail
new_item = Enum.reverse(decode_item(item, []))
do_decode_item(new_tail, new_item)
end
defp do_decode_item(<<prefix, tail::binary>>, result) when prefix <= 247 do
item_length = prefix - 192
<<item::binary-size(item_length), new_tail::binary>> = tail
new_item = decode_item(item, [])
do_decode_item(new_tail, [new_item | result])
end
##
## FINISHED HANDLING 193-247
##
# decode_long_binary - CAN'T OPTIMISE FOR NOW
defp do_decode_item(<<be_size_prefix, tail::binary>>, nil) do
{list, new_tail} = decode_long_binary(be_size_prefix - 247, tail)
new_result = Enum.reverse(decode_item(list, []))
do_decode_item(new_tail, new_result)
end
defp do_decode_item(<<be_size_prefix, tail::binary>>, result) do
{list, new_tail} = decode_long_binary(be_size_prefix - 247, tail)
new_result = decode_item(list, [])
do_decode_item(new_tail, [new_result | result])
end
@spec decode_long_binary(integer(), binary()) :: {binary(), binary()}
defp decode_long_binary(be_size, tail) do
<<be::binary-size(be_size), data::binary>> = tail
item_length = :binary.decode_unsigned(be)
<<item::binary-size(item_length), new_tail::binary>> = data
{item, new_tail}
end
end
But this also gave a bit worse results.
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 20 s
memory time: 5 s
parallel: 1
inputs: none specified
Estimated total run time: 1.35 min
Benchmarking Ino...
Benchmarking InoShort...
Benchmarking master...
Name ips average deviation median 99th %
Ino 4.65 215.13 ms ±5.08% 218.06 ms 243.39 ms
InoShort 3.49 286.14 ms ±3.93% 283.93 ms 316.84 ms
master 2.62 382.17 ms ±2.06% 380.47 ms 415.40 ms
Comparison:
Ino 4.65
InoShort 3.49 - 1.33x slower
master 2.62 - 1.78x slower
Memory usage statistics:
Name Memory usage
Ino 84.08 MB
InoShort 90.68 MB - 1.08x memory usage
master 125.56 MB - 1.49x memory usage
@InoMurko great work!
can you bump version to 0.5.0
and your changes to Changelog.md
This is the most optimised approach I could do for now (except _decode_longbinary that creates two sub binaries).
Compiler tells me he was able to optimise every single case except where there's a
decode_long_binary
call.https://gist.github.com/InoMurko/0ab024a8bcc0ebc92617b80b463b7afc#file-compiler-optimisations