Closed vovayartsev closed 6 years ago
I actually ended up removing the ecto tracer from spandex, as it felt out of the purview of this library. The ability to construct traces still exists, but is slightly different than the old source. I'll put together a gist (and put a link to it in the readme). I won't be able to do anything about the old dead documentation that appears on google though :(
I'm currently traveling, so I'm not in a position to test this out, but I did my best to adapt it to the current version. Mind giving it a shot?
defmodule MyApp.Ecto.Trace do
@moduledoc """
A trace builder that can be given to ecto as a logger. It will try to get
the trace_id and span_id from the caller pid in the case that the particular
query is being run asynchronously (as in the case of parallel preloads).
Traces will default to the service name `:ecto` but can be configured:
config :spandex, :ecto,
service: :my_ecto
To configure, set it up as an ecto logger like so:
config :my_app, MyApp.Repo,
loggers: [{Ecto.LogEntry, :log, [:info]}, {MyApp.Ecto.Trace, :trace, []}]
"""
@default_service_name :ecto
defmodule Error do
defexception [:message]
end
def trace(log_entry) do
# Put in your own configuration here
otp_app = :otp_app
tracer = MyApp.Tracer
service = :ecto
database = "some_database"
unless Application.get_env(otp_app, tracer) do
now = Spandex.Adapters.Utils.now()
_ = setup(log_entry, tracer)
query = string_query(log_entry)
num_rows = num_rows(log_entry)
queue_time = get_time(log_entry, :queue_time)
query_time = get_time(log_entry, :query_time)
decoding_time = get_time(log_entry, :decode_time)
start = now - (queue_time + query_time + decoding_time)
tracer.update_span(
start: start,
completion_time: now,
service: service,
resource: query,
type: :db,
sql_query: [
query: query,
rows: inspect(num_rows),
db: database
]
)
_ = report_error(log_entry)
if queue_time != 0 do
_ = tracer.start_span("queue")
_ = tracer.update_span(start: start, completion_time: start + queue_time)
_ = tracer.finish_span()
end
if query_time != 0 do
_ = tracer.start_span("run_query")
_ =
tracer.update_span(
start: start + queue_time,
completion_time: start + queue_time + query_time
)
_ = tracer.finish_span()
end
if decoding_time != 0 do
_ = tracer.start_span("decode")
_ = tracer.update_span(start: start + queue_time + query_time, completion_time: now)
_ = tracer.finish_span()
end
finish_ecto_trace(log_entry, tracer)
end
log_entry
end
defp finish_ecto_trace(%{caller_pid: caller_pid}, tracer) do
if caller_pid != self() do
tracer.finish_trace()
else
tracer.finish_span()
end
end
defp finish_ecto_trace(_), do: :ok
defp setup(%{caller_pid: caller_pid}, tracer) when is_pid(caller_pid) do
if caller_pid == self() do
Logger.metadata(trace_id: tracer.current_trace_id(), span_id: tracer.current_span_id())
tracer.start_span("query")
else
trace = Process.info(caller_pid)[:dictionary][:spandex_trace]
if trace do
trace_id = trace.id
span_id =
trace
|> Map.get(:stack)
|> Enum.at(0, %{})
|> Map.get(:id)
Logger.metadata(trace_id: trace_id, span_id: span_id)
tracer.continue_trace("query", trace_id, span_id)
else
tracer.start_trace("query")
end
end
end
defp setup(_) do
:ok
end
defp report_error(%{result: {:ok, _}}), do: :ok
defp report_error(%{result: {:error, error}}) do
tracer.span_error(%Error{message: inspect(error)}, nil)
end
defp string_query(%{query: query}) when is_function(query),
do: Macro.unescape_string(query.() || "")
defp string_query(%{query: query}) when is_bitstring(query), do: Macro.unescape_string(query)
defp string_query(_), do: ""
defp num_rows(%{result: {:ok, %{num_rows: num_rows}}}), do: num_rows
defp num_rows(_), do: 0
def get_time(log_entry, key) do
value = Map.get(log_entry, key)
if is_integer(value) do
to_nanoseconds(value)
else
0
end
end
defp to_nanoseconds(time), do: System.convert_time_unit(time, :native, :nanoseconds)
defp config do
:spandex
|> Confex.get_env(:ecto, [])
|> Keyword.put_new(:service, @default_service_name)
end
end
There is a bug there. It should read unless Application.get_env(otp_app, tracer)[:disabled?] do
@zachdaniel Sorry for along response loop, I've been on vacations.
Generally, the snipped worked, after fixing two errors:
Spandex.Adapters.Utils.now/0
is undefinedtracer
variable is undefined within report_error()
functionThere's one confusion though: somehow I see my OTP app's spans within an Ecto trace. Maybe I need to create a separate Tracer module for Ecto stuff?
Thank you 👍
@vovayartsev Could you share your version? In my case the call to tracer.start_span("query")
returns {:error, :no_trace_context}
@vovayartsev I think you want to override the top level span's service. Hopefully your version helps @christopheonce also! Sorry I've been so busy lately I haven't had a chance to work on this library.
@christopheonce @zachdaniel Below is my version which works fine for me now, and my prior question about "Why my OTP app in Ecto logs" is fully resolved:
defmodule Core.Tracer do
@moduledoc """
DataDog tracer
"""
use Spandex.Tracer, otp_app: :my_app
end
defmodule Core.Ecto.Trace do
@moduledoc """
A trace builder that can be given to ecto as a logger. It will try to get
the trace_id and span_id from the caller pid in the case that the particular
query is being run asynchronously (as in the case of parallel preloads).
Traces will default to the service name `:ecto` but can be configured:
config :spandex, :ecto,
service: :my_ecto
To configure, set it up as an ecto logger like so:
config :my_app, MyApp.Repo,
loggers: [{Ecto.LogEntry, :log, [:info]}, {MyApp.Ecto.Trace, :trace, []}]
"""
@default_service_name :ecto
defmodule Error do
defexception [:message]
end
def trace(log_entry, database) do
# Put in your own configuration here
otp_app = :my_app
tracer = Core.Tracer
service = @default_service_name
unless Application.get_env(otp_app, tracer)[:disabled?] do
now = :os.system_time(:nano_seconds)
_ = setup(log_entry, tracer)
query = string_query(log_entry)
num_rows = num_rows(log_entry)
queue_time = get_time(log_entry, :queue_time)
query_time = get_time(log_entry, :query_time)
decoding_time = get_time(log_entry, :decode_time)
start = now - (queue_time + query_time + decoding_time)
tracer.update_span(
start: start,
completion_time: now,
service: service,
resource: query,
type: :db,
sql_query: [
query: query,
rows: inspect(num_rows),
db: database
]
)
_ = report_error(tracer, log_entry)
if queue_time != 0 do
_ = tracer.start_span("queue")
_ =
tracer.update_span(service: service, start: start, completion_time: start + queue_time)
_ = tracer.finish_span()
end
if query_time != 0 do
_ = tracer.start_span("run_query")
_ =
tracer.update_span(
service: service,
start: start + queue_time,
completion_time: start + queue_time + query_time
)
_ = tracer.finish_span()
end
if decoding_time != 0 do
_ = tracer.start_span("decode")
_ =
tracer.update_span(
service: service,
start: start + queue_time + query_time,
completion_time: now
)
_ = tracer.finish_span()
end
finish_ecto_trace(log_entry, tracer)
end
log_entry
end
defp finish_ecto_trace(%{caller_pid: caller_pid}, tracer) do
if caller_pid != self() do
tracer.finish_trace()
else
tracer.finish_span()
end
end
defp finish_ecto_trace(_, _), do: :ok
defp setup(%{caller_pid: caller_pid}, tracer) when is_pid(caller_pid) do
if caller_pid == self() do
Logger.metadata(trace_id: tracer.current_trace_id(), span_id: tracer.current_span_id())
tracer.start_span("query")
else
trace = Process.info(caller_pid)[:dictionary][:spandex_trace]
if trace do
trace_id = trace.id
span_id =
trace
|> Map.get(:stack)
|> Enum.at(0, %{})
|> Map.get(:id)
Logger.metadata(trace_id: trace_id, span_id: span_id)
tracer.continue_trace("query", trace_id, span_id)
else
tracer.start_trace("query")
end
end
end
defp setup(_, _) do
:ok
end
defp report_error(_tracer, %{result: {:ok, _}}), do: :ok
defp report_error(tracer, %{result: {:error, error}}) do
tracer.span_error(%Error{message: inspect(error)}, nil)
end
defp string_query(%{query: query}) when is_function(query),
do: Macro.unescape_string(query.() || "")
defp string_query(%{query: query}) when is_bitstring(query), do: Macro.unescape_string(query)
defp string_query(_), do: ""
defp num_rows(%{result: {:ok, %{num_rows: num_rows}}}), do: num_rows
defp num_rows(_), do: 0
def get_time(log_entry, key) do
value = Map.get(log_entry, key)
if is_integer(value) do
to_nanoseconds(value)
else
0
end
end
defp to_nanoseconds(time), do: System.convert_time_unit(time, :native, :nanoseconds)
end
and in prod.exs:
config :my_app, MyApp.MyRepo,
....
loggers: [
{Ecto.LogEntry, :log, [:debug]},
{Core.Ecto.Trace, :trace, ["MyRepo"]}
]
@christopheonce please be aware of this issue when changing loggers
: https://stackoverflow.com/questions/48676316/ecto-repo-loggers-config-ignored
@vovayartsev thank you for sharing the code that works for you. @zachdaniel I understand you have limited time to work on this. I finally understand why it does not work for me. All my interactions with the database go through a genserver. While debugging I figured out the trace is kept in the process spawned by Ranch. As the process logging the time spent in the database is the genserver. That's why it tells me no trace was started. I will work around this so it behaves as if it was a remote server.
Awesome! I think what we should have is a spandex_ecto
library, that leverages this code and keeps it out of core.
I've added a note about needing this feature in a separate lib to the README, and added a link to this issue so others can use that code.
@zachdaniel the code above has a problem: the completion time gets overridden and all traces ends at the same time even though the Ecto log says otherwise. I traced the issue to the call to finish_span() which overrides the completion time. https://github.com/zachdaniel/spandex/blob/master/lib/spandex.ex#L160
What do you advise to avoid this?
Interesting. finish_span should definitely not override the completion time like that, so I think the solution is most likely just to fix that bug. I think I'd really like to see this moved into a spandex_ecto
repository, so we can track issues separately there. I'll look into the bug shortly.
The bug that was brought up here recently has been resolved and is available in version 2.0.0
@christopheonce and @vovayartsev - I haven't had a chance to test this, but if you'd like to give it a shot: https://github.com/spandex-project/spandex_ecto
Hey!
The documented way of tracing Ecto was lost somewhere between v1.3 and v1.6... However, googling for
Spandex Ecto
still takes you to that outdated page. Would it be possible to describe or link to the proper way of tracing Ecto right in the main README?Thanks for doing your great job!