hauleth / defconst

Helper macros for defining constant values in modules
https://hex.pm/packages/defconstant
41 stars 1 forks source link

try/catch is extremely slow #1

Closed hissssst closed 5 months ago

hissssst commented 5 months ago

Try/catch is extremely slow flow. The faster solution would be to

  1. Store two values in persistent term: one boolean which indicates if value is initialized and one which stores the value
  2. Use :persistent_term.get(key, default_value) where default_value is some special value like :__defconst_special_atom__.

The latter is the fastest, but may fail in some extreme scenarios

hauleth commented 5 months ago

I have benchmarked the happy path (aka calling function with cached data) and on Erlang 25 w/o JIT there is no difference and on R26 w/ JIT enabled try version is almost 2x faster. So if you do not have benchmarks that says otherwise then I would keep current code as is.

hauleth commented 5 months ago

And to be exact - happy path is way more important there. The sad path will ideally happen only once during application lifetime, so that is not an issue.

hissssst commented 5 months ago

@hauleth FYI: try/catch slows down even a happy path. Difference in performance is 30%. Here's a benchmark

defmodule X do
  def trycatch(x) do
    :persistent_term.get(x)
  catch
    :error, :badarg ->
      :persistent_term.put(x, 123)
      123
  end

  def pg(x) do
    with [] <- :persistent_term.get(x, []) do
      :persistent_term.put(x, 123)
      123
    end
  end
end

X.trycatch(:one)
X.pg(:two)

Benchee.run(%{
  "try" => fn ->
    X.trycatch(:one)
  end,
  "persistent_term" => fn ->
    X.pg(:two)
  end
})

"""
Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 15.34 GB
Elixir 1.15.7
Erlang 25.3.2.11

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking persistent_term...
Benchmarking try...

Name                      ips        average  deviation         median         99th %
persistent_term       18.57 M       53.84 ns ±52072.04%          15 ns         174 ns
try                   14.63 M       68.34 ns ±38195.07%          14 ns         255 ns

Comparison:
persistent_term       18.57 M
try                   14.63 M - 1.27x slower +14.50 ns
"""
hauleth commented 5 months ago

There is benchmark in repo that I have used and in OTP 26 difference is nonexistent, while try can cause problems:

Operating System: macOS
CPU Information: Apple M3 Pro
Number of Available Cores: 12
Available memory: 36 GB
Elixir 1.16.2
Erlang 26.2.5
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 14 s

Benchmarking default ...
Benchmarking try ...
Calculating statistics...
Formatting results...

Name              ips        average  deviation         median         99th %
default       24.83 M       40.27 ns ±27339.14%          42 ns          42 ns
try           23.31 M       42.89 ns ±30340.69%          42 ns          42 ns

Comparison:
default       24.83 M
try           23.31 M - 1.07x slower +2.62 ns

Also, depending on the run sometimes try was faster, sometimes case was faster. And having fully correct implementation with case would require creating reference each time we want to check for existence, which would impose additional performance penalty. I think that in current and future versions of OTP it should not be an important problem wrt. performance.

hissssst commented 5 months ago

And OTP26 benchmark:

Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 15.34 GB
Elixir 1.16.2
Erlang 26.2.3

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 24 s

Benchmarking persistent_term...
Benchmarking try...

Name                      ips        average  deviation         median         99th %
persistent_term       15.60 M       64.10 ns ±96380.83%          23 ns         154 ns
try                   12.69 M       78.79 ns ±94027.45%          35 ns         155 ns

Comparison:
persistent_term       15.60 M
try                   12.69 M - 1.23x slower +14.70 ns
hissssst commented 5 months ago

depending on the run sometimes try was faster, sometimes case was faster

This is common if you benchmark ns-precise calls without limiting CPU frequency. I do bind CPU frequency and benchmarks results are consistent for me with +/- 5% difference. But even with this difference, try is always slower.

And there is also non-benchmark-related argument. try/catch is slower, since it initiates a routine which instantiates the stracktrace by jumping through all of the labels in the code. Stacktrace is instantiated on every error handling, even when this stacktrace object is not used, and this is done because exceptions can occur while handling an exception, and runtime wants to print two stracktraces in this case. But this is true only for rescue and catch clauses

Here's a benchmark which proves it

defmodule X do

  for i <- 1..20 do
    name = :"f#{i}"
    next = :"f#{i + 1}"

    def unquote(name)(x) do
      unquote(next)(x) + 1
    end
  end

  def f21(x) do
    trycatch(x)
  end

  for i <- 1..20 do
    name = :"g#{i}"
    next = :"g#{i + 1}"

    def unquote(name)(x) do
      unquote(next)(x) + 1
    end
  end

  def g21(x) do
    pt(x)
  end

  def trycatch(x) do
    :persistent_term.get(x)
  catch
    :error, :badarg ->
      :persistent_term.put(x, 123)
      123
  end

  def pt(x) do
    with [] <- :persistent_term.get(x, []) do
      :persistent_term.put(x, 123)
      123
    end
  end
end

143 = X.f1(:one)
143 = X.g1(:two)

Benchee.run(%{
  "try" => fn ->
    X.f1(:one)
  end,
  "persistent_term" => fn ->
    X.g1(:two)
  end
}, time: 10)

"""
Operating System: Linux
CPU Information: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Number of Available Cores: 8
Available memory: 15.34 GB
Elixir 1.15.7
Erlang 25.3.2.11

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 24 s

Benchmarking persistent_term...
Benchmarking try...

Name                      ips        average  deviation         median         99th %
persistent_term       23.28 M       42.96 ns ±74583.43%          18 ns         138 ns
try                   16.34 M       61.20 ns ±88482.67%          21 ns         133 ns

Comparison:
persistent_term       23.28 M
try                   16.34 M - 1.42x slower +18.24 ns
"""
hauleth commented 5 months ago

What will happen if we want to have correct implementation so we would do:

ref = make_ref()

with ^ref <- :persistent_term.get(x, ref) do
  :persistent_term.put(x, 123)
  123
end

Because that is what we would need to do to be 100% correct. Because I believe that it may impose non-negligible performance hit as well.

hissssst commented 5 months ago

First of all, this wouldn't be 100% correct. Second, just use some atom like :__defconst_defonce_stub__. I think that handling extremely rare case (which, let's be honest, will never occur in any real world application) is less important than having 30% performance difference on constant access