spandex-project / spandex

A platform agnostic tracing library
MIT License
335 stars 53 forks source link

Performance problems with Optimal dependency #100

Open amandasposito opened 5 years ago

amandasposito commented 5 years ago

I ran a profiler on my app using the eprof and fprof to identify bottlenecks to trace the execution of all functions in the code and report the time consumed with each.

Looking at the results 25% of the time was spent on Optimal, which is a lib responsible to validating code.

So I set up a benchmark script using Benchee to test the original code with the Optimal dependency and another one that overrides it with a simpler implementation.

The result was about 10% faster over the whole request without the Optimal dependency.

https://hexdocs.pm/mix/1.8.1/Mix.Tasks.Profile.Eprof.html https://hexdocs.pm/mix/1.8.1/Mix.Tasks.Profile.Fprof.html

Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
Number of Available Cores: 3
Available memory: 5.82 GB
Elixir 1.8.1
Erlang 21.3.7
Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 15 s
Benchmarking 0_warmup...
Benchmarking 1_normal...
Benchmarking 2_optimal_overridden...
warning: redefining module Optaimal (current version loaded from _build/test/lib/optimal/ebin/Elixir.Optimal.beam)
  priv/myapp_web/script_optimal.exs:23
Name                           ips        average  deviation         median         99th %
2_optimal_overridden        270.69        3.69 ms    ±33.85%        3.26 ms        7.91 ms
1_normal                    248.06        4.03 ms    ±35.16%        3.58 ms        9.72 ms
0_warmup                    129.54        7.72 ms  ±1258.76%        3.45 ms       11.21 ms
Comparison:
2_optimal_overridden        270.69
1_normal                    248.06 - 1.09x slower +0.34 ms
0_warmup                    129.54 - 2.09x slower +4.03 ms
zachdaniel commented 5 years ago

That is really useful info! I’d be happy to accept a PR that removed the optimal dependency.

GregMefford commented 5 years ago

Yeah I was surprised that the performance impact was so significant! My only concern would be that removing it would be a breaking change unless we somehow keep the behavior the same. In practice, it's probably fine to remove it or at least make it disable-able, because for most use-cases, people will only be sending valid spans once they figure out the right things to send.

If we removed Optimal, things might silently fail or fail in a different place, but at the end of the day, it would've crashed anyway.

zachdaniel commented 5 years ago

I think an alternative would be to fork optimal (I actually wrote it originally, but its in my old employer's name) and just fix the performance problem.

zachdaniel commented 5 years ago

They'd probably be amenable to a PR, also.

vherr2 commented 5 years ago

If someone makes a PR I'll gladly take a look; Hoping to take a glance at it if y'all don't get to it first!

zachdaniel commented 4 years ago

We should just switch from optimal to nimble options. Anyone who wants to take a crack at this is welcome!

connorlay commented 4 years ago

@zachdaniel I'm interested in picking this up! I've been using Spandex for over a year now and want to contribute back :grin:

zachdaniel commented 4 years ago

Awesome! No one else has claimed it, so go for it 💪

kamilkowalski commented 4 years ago

Hey @connorlay, need any help? We're seeing considerable performance overhead when using Spandex at Fresha - I'd like to factor out optimal but since it's a considerable amount of work I didn't want to duplicate it in case you've already started.

connorlay commented 4 years ago

Hi @kamilkowalski! Feel free to take this over. I never made much progress on this after realizing the size of the refactor.

zachdaniel commented 4 years ago

The best strategy here is likely just to use nimble options. We'll just need to rewrite the schemas to their syntax.

GregMefford commented 4 years ago

I'm also available / willing / interested in helping to make this migration happen. I think long-term we will likely just migrate away from Spandex in favor of OpenTelemetry, but given Datadog's timeline on supporting OpenTelemetry and the amount of effort it'll take to switch away from Spandex, this is a worthwhile investment in the meantime.

zachdaniel commented 4 years ago

Well the API for NimbleOptions and Optimal are almost exactly the same, so I honestly don't think it would be too difficult to do. I'm too busy at the moment though unfortunately.

GregMefford commented 4 years ago

I looked at it briefly and I don't think it'll be that bad, it's just a lot of tedious verification that things are still doing what we want, and also a technically-breaking change that I doubt anyone cares about in practice: our API currently returns Optimal validation errors, so probably the better API design would be to either return some well-known set of Spandex validation errors, or just change the typespecs to be less-specific about what kind of error struct you might get, and just switch it to NimbleOptions instead of Optimal, with or without a major version bump to signal that there's a breaking change there, depending on what we find in testing/verification that things are working.

GregMefford commented 4 years ago

Another possible path could be to simply make Optimal be optional: if you have it in your app's deps, then options are validated and if not, then you don't and you just crash or something if you send an incorrect type. This would be the "high performance" production mode of operation after you've already validated that you're sending the right thing via dev/test.

zachdaniel commented 4 years ago

The only issue with that would be default values/other logic around values. It doesn't just validate, it can also transform.

hkrutzer commented 4 years ago

Won't it be better for performance to not do validation? For functions where low latency is required people would probably prefer performace over ease of development.

zachdaniel commented 4 years ago

Yeah, I think it would be better for sure, at least to have it only validate in development mode or something like that. I'm just saying we'll have to manually write the code that isn't validation (that sets defaults for example).