mana-ethereum / ex_rlp

Elixir implementation of Ethereum's RLP (Recursive Length Prefix) encoding
MIT License
31 stars 9 forks source link

opt #19

Closed InoMurko closed 6 years ago

InoMurko commented 6 years ago

This is the most optimised approach I could do for now (except _decode_longbinary that creates two sub binaries).

Compiler tells me he was able to optimise every single case except where there's a decode_long_binary call.

https://gist.github.com/InoMurko/0ab024a8bcc0ebc92617b80b463b7afc#file-compiler-optimisations

InoMurko commented 6 years ago

Running benchee a couple of times it gives me somewhat consistently better results (configuration branch https://github.com/InoMurko/ex_rlp/commit/82dfda9ca6cc5c38f320e6afc5f6fbdf713eda9f):

ex_rlp  opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking Ino...
Benchmarking master...

Name             ips        average  deviation         median         99th %
Ino             3.68      271.64 ms    ±27.02%      243.74 ms      462.17 ms
master          2.14      468.35 ms    ±12.89%      449.40 ms      627.16 ms

Comparison:
Ino             3.68
master          2.14 - 1.72x slower
ex_rlp  opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 μs
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking Ino...
Benchmarking master...

Name             ips        average  deviation         median         99th %
Ino             4.57      218.78 ms     ±5.00%      223.76 ms      239.24 ms
master          2.02      495.34 ms    ±13.30%      519.62 ms      598.91 ms

Comparison:
Ino             4.57
master          2.02 - 2.26x slower
ex_rlp  opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
parallel: 1
inputs: none specified
Estimated total run time: 28 s

Benchmarking Ino...
Benchmarking master...

Name             ips        average  deviation         median         99th %
Ino             4.26      234.73 ms     ±9.57%      229.55 ms      291.80 ms
master          2.39      418.85 ms    ±13.28%      393.64 ms      583.66 ms

Comparison:
Ino             4.26
master          2.39 - 1.78x slower

Memory usage statistics:

Name      Memory usage
Ino           84.08 MB
master       125.56 MB - 1.49x memory usage

**All measurements for memory usage were the same**
ex_rlp  opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 20 s
memory time: 5 s
parallel: 1
inputs: none specified
Estimated total run time: 54 s

Benchmarking Ino...
Benchmarking master...

Name             ips        average  deviation         median         99th %
Ino             4.63      216.12 ms     ±5.20%      217.48 ms      258.35 ms
master          2.70      369.85 ms     ±3.13%      365.91 ms      397.19 ms

Comparison:
Ino             4.63
master          2.70 - 1.71x slower

Memory usage statistics:

Name      Memory usage
Ino           84.08 MB
master       125.56 MB - 1.49x memory usage

**All measurements for memory usage were the same**
ex_rlp  opt_benchee $ mix run lib/run.exs
Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 20 s
memory time: 5 s
parallel: 1
inputs: none specified
Estimated total run time: 54 s

Benchmarking Ino...
Benchmarking master...

Name             ips        average  deviation         median         99th %
Ino             4.12      242.50 ms    ±20.58%      229.01 ms      460.16 ms
master          2.38      420.67 ms    ±14.46%      399.50 ms      623.68 ms

Comparison:
Ino             4.12
master          2.38 - 1.73x slower

Memory usage statistics:

Name      Memory usage
Ino           84.08 MB
master       125.56 MB - 1.49x memory usage

**All measurements for memory usage were the same**
InoMurko commented 6 years ago

I also tried shortening the amount of methods by using a range bit pattern as in original:

defmodule ExRLP.DecodeItemShort do
  @moduledoc """
    Captures bins and decodes them.
  """
  @spec decode_item(binary(), ExRLP.t()) :: ExRLP.t()
  def decode_item(rlp_binary), do: do_decode_item(rlp_binary, nil)
  def decode_item(rlp_binary, result), do: do_decode_item(rlp_binary, result)

  defp do_decode_item(<<>>, result) when is_list(result) do
    Enum.reverse(result)
  end

  defp do_decode_item(<<>>, result), do: result
  ##
  ## HANDLING 0 - 127
  ##
  defp do_decode_item(<<prefix, tail::binary>>, nil) when prefix < 128 do
    do_decode_item(tail, <<prefix>>)
  end

  defp do_decode_item(<<prefix, tail::binary>>, result) when prefix < 128 do
    do_decode_item(tail, [<<prefix>> | result])
  end

  ##
  ## FINISHED HANDLING 0 - 127
  ##

  ##
  ## HANDLING 128 - 183
  ##
  defp do_decode_item(<<prefix, tail::binary>>, nil) when prefix <= 183 do
    item_length = prefix - 128
    <<item::binary-size(item_length), new_tail::binary>> = tail
    do_decode_item(new_tail, item)
  end

  defp do_decode_item(<<prefix, tail::binary>>, result) when prefix <= 183 do
    item_length = prefix - 128
    <<item::binary-size(item_length), new_tail::binary>> = tail
    do_decode_item(new_tail, [item | result])
  end

  ##
  ## FINISHED HANDLING 128-183
  ##

  # decode_long_binary - CAN'T OPTIMISE FOR NOW
  defp do_decode_item(<<be_size_prefix, tail::binary>>, nil) when be_size_prefix < 192 do
    {new_item, new_tail} = decode_long_binary(be_size_prefix - 183, tail)

    do_decode_item(new_tail, new_item)
  end

  defp do_decode_item(<<be_size_prefix, tail::binary>>, result) when be_size_prefix < 192 do
    {new_item, new_tail} = decode_long_binary(be_size_prefix - 183, tail)

    do_decode_item(new_tail, [new_item | result])
  end

  ##
  ## HANDLING 192
  ##
  defp do_decode_item(<<192, tail::binary>>, nil) do
    do_decode_item(tail, [])
  end

  defp do_decode_item(<<192, tail::binary>>, result) do
    do_decode_item(tail, [[] | result])
  end

  ##
  ## FINISHED HANDLING 192
  ##

  ##
  ## HANDLING 193-247
  ##
  defp do_decode_item(<<prefix, tail::binary>>, nil) when prefix <= 247 do
    item_length = prefix - 192
    <<item::binary-size(item_length), new_tail::binary>> = tail
    new_item = Enum.reverse(decode_item(item, []))
    do_decode_item(new_tail, new_item)
  end

  defp do_decode_item(<<prefix, tail::binary>>, result) when prefix <= 247 do
    item_length = prefix - 192
    <<item::binary-size(item_length), new_tail::binary>> = tail
    new_item = decode_item(item, [])
    do_decode_item(new_tail, [new_item | result])
  end

  ##
  ## FINISHED HANDLING 193-247
  ##

  # decode_long_binary - CAN'T OPTIMISE FOR NOW
  defp do_decode_item(<<be_size_prefix, tail::binary>>, nil) do
    {list, new_tail} = decode_long_binary(be_size_prefix - 247, tail)

    new_result = Enum.reverse(decode_item(list, []))

    do_decode_item(new_tail, new_result)
  end

  defp do_decode_item(<<be_size_prefix, tail::binary>>, result) do
    {list, new_tail} = decode_long_binary(be_size_prefix - 247, tail)

    new_result = decode_item(list, [])

    do_decode_item(new_tail, [new_result | result])
  end

  @spec decode_long_binary(integer(), binary()) :: {binary(), binary()}
  defp decode_long_binary(be_size, tail) do
    <<be::binary-size(be_size), data::binary>> = tail

    item_length = :binary.decode_unsigned(be)

    <<item::binary-size(item_length), new_tail::binary>> = data

    {item, new_tail}
  end
end

But this also gave a bit worse results.

Operating System: macOS"
CPU Information: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz
Number of Available Cores: 4
Available memory: 16 GB
Elixir 1.7.4
Erlang 21.1

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 20 s
memory time: 5 s
parallel: 1
inputs: none specified
Estimated total run time: 1.35 min

Benchmarking Ino...
Benchmarking InoShort...
Benchmarking master...

Name               ips        average  deviation         median         99th %
Ino               4.65      215.13 ms     ±5.08%      218.06 ms      243.39 ms
InoShort          3.49      286.14 ms     ±3.93%      283.93 ms      316.84 ms
master            2.62      382.17 ms     ±2.06%      380.47 ms      415.40 ms

Comparison:
Ino               4.65
InoShort          3.49 - 1.33x slower
master            2.62 - 1.78x slower

Memory usage statistics:

Name        Memory usage
Ino             84.08 MB
InoShort        90.68 MB - 1.08x memory usage
master         125.56 MB - 1.49x memory usage
ayrat555 commented 6 years ago

@InoMurko great work! can you bump version to 0.5.0 and your changes to Changelog.md