spandex-project / spandex

A platform agnostic tracing library
MIT License
335 stars 53 forks source link

Add trace sending genserver #11

Closed zachdaniel closed 7 years ago

zachdaniel commented 7 years ago

Currently traces are publish synchronously by the process doing the tracing, which is definitely not scalable.

driv3r commented 7 years ago

Ideas for solution

I started working on this one, thinking between 2/3 solutions:

  1. Upgrade of Spandex.Datadog.Api.create_trace() to gen server and async execution via cast
  2. New "Datadog Serverwhich aggregates spans in state, and periodically flushes them viaApi.create_trace()`
  3. GenStage where producer would aggregate spans, and consumers executing Api.create_trace()

What do you think?

Simple perf benchmark

I've done some simple benchmark, where raw method takes around ~0.9s in order to simulate work. Same execution with tracing is almost 10x slower at the moment (which we can also see when load testing our product api)

defmodule TestModule.WithTrace do
  use Spandex.TraceDecorator

  @decorate trace()
  def call do
    process()
    process()
    process()
  end

  @decorate span()
  def process() do
    :timer.sleep(50)
    fetch()
    fetch()
    :timer.sleep(50)
    fetch()
    fetch()
  end

  @decorate span()
  defp fetch() do
    :timer.sleep(50)
  end
end

defmodule TestModule.WithoutTrace do
  def call do
    process()
    process()
    process()
  end

  def process() do
    :timer.sleep(50)
    fetch()
    fetch()
    :timer.sleep(50)
    fetch()
    fetch()
  end

  defp fetch() do
    :timer.sleep(50)
  end
end

Benchee.run(%{
  "tracing: OFF" => fn -> TestModule.WithoutTrace.call() end,
  "tracing: ON " => fn -> TestModule.WithTrace.call() end,
})
Name                   ips        average  deviation         median
tracing: OFF          1.09         0.92 s     ±0.00%         0.92 s
tracing: ON          0.112         8.96 s     ±0.00%         8.96 s

Comparison: 
tracing: OFF          1.09
tracing: ON          0.112 - 9.76x slower