scoutapp / scout_apm_elixir

ScoutAPM Elixir Agent. Supports Phoenix and other frameworks.
https://scoutapm.com
Other
36 stars 20 forks source link

ScoutApm Core Agent TCP socket closed #100

Closed mcasper closed 4 years ago

mcasper commented 5 years ago

Hey folks,

I'm running the 1.0 version of this library that now uses the scout core agent for sending metrics, and am getting the following message over and over, and am not getting any metrics reported:

15:31:19.445 [warn] ScoutApm Core Agent TCP socket closed. Attempting to reconnect.

I've checked to make sure that the agent is running and accepting messages:

$ /tmp/scout_apm_core/core-agent probe --socket /tmp/scout_apm_core/scout-agent.sock
[2019-09-23T15:32:50][core_agent][INFO] Initializing logger with log level: Info
Agent found: CoreAgentVersion { raw: "1.1.10" }

I'm running this in a docker container starting from bitwalker/alpine-elixir:1.9.1, which is based off alpine:3.10

My scout config looks like the following:

config :scout_apm,
  name: <application name>,
  key: System.get_env("SCOUT_APM_KEY"),
  core_agent_triple: "i686-unknown-linux-musl"
cschneid commented 5 years ago

Thank you for the report - I'm taking a look

cschneid commented 5 years ago

Hi @mcasper I think I messed up bumping the version in the binary with this release, which caused it to report wrong to that probe command and continually attempt to stop and restart it. I've released 1.2.1 to fix that. Can you set core_agent_version to 1.2.1 explicitly and see if that works?

The setting is:

core_agent_version: "v1.2.1"

mcasper commented 5 years ago

I gave that a try, still no dice 😞

I confirmed that the core agent is running the new version:

$ /tmp/scout_apm_core/core-agent probe --socket /tmp/scout_apm_core/scout-agent.sock
[2019-09-23T22:25:27][core_agent][INFO] Initializing logger with log level: Info
Agent found: CoreAgentVersion { raw: "1.2.1" }

But am still getting the same error message

cschneid commented 5 years ago

Thank you for trying the new version of the CoreAgent. Sorry it didn't work.

We are using the tcp/ip connector with the Elixir agent. It looks like the initial connection completes, but then fails soon after. I don't know what would cause that though.

Our Elixir agent maintainer is out for a few more days, and he'll take a look right when he gets back.

mitchellhenke commented 5 years ago

@mcasper I was unable to reproduce this with a test phoenix application in the docker image, the core agent was able to connect and send metrics. The TCP connection happens locally on 127.0.0.1:9000 by default, so any different setup around that could cause an issue.

Can you set log_level: :debug in your Scout config as well and report what it says?

mcasper commented 4 years ago

Oh nice, thanks for the tip on log_level. Here's the output:

16:21:16.969 [warn] ScoutApm Core Agent TCP socket closed. Attempting to reconnect.
16:21:17.006 [error] GenServer ScoutApm.Core.AgentManager terminating
** (FunctionClauseError) no function clause matching in :gen_tcp.send/2
    (kernel) gen_tcp.erl:283: :gen_tcp.send({:ok, #Port<0.745>}, <<0, 0, 5, 192>>)
    (scout_apm) lib/scout_apm/core/agent_manager.ex:203: ScoutApm.Core.AgentManager.send_message/2
    (scout_apm) lib/scout_apm/core/agent_manager.ex:138: ScoutApm.Core.AgentManager.handle_cast/2
    (stdlib) gen_server.erl:637: :gen_server.try_dispatch/4
    (stdlib) gen_server.erl:711: :gen_server.handle_msg/6
    (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
Last message: {:"$gen_cast", {:send, %{ApplicationEvent: %{event_type: "scout.metadata", event_value: %{app_server: "", application_name: <app_name>, application_root: "", database_adapter: "", database_engine: "", environment: "", framework: "", framework_version: "", git_sha: nil, hostname: "6c5f4f61c864", language: "elixir", libraries: [["phoenix", "1.4.10"], ["elixir", "1.9.1"], ["metrics", "1.0.1"], ["plug_crypto", "1.0.0"], ["runtime_tools", "1.14"], ["jason", "1.1.2"], ["eex", "1.9.1"], ["mimerl", "1.2.0"], ["compiler", "7.4.5"], ["phoenix_ecto", "4.0.0"], ["mix", "1.9.1"], ["phoenix_html", "2.13.3"], ["scout_apm", "1.0.1"], ["plug_cowboy", "2.1.0"], ["cowlib", "2.7.3"], ["hackney", "1.15.2"], ["inets", "7.1"], ["ssl", "9.4"], ["corsica", "1.1.2"], ["decimal", "1.8.0"], ["certifi", "2.5.1"], ["binder", "0.1.0"], ["hex", "0.20.1"], ["mime", "1.3.1"], ["ranch", "1.7.1"], ["ecto", "3.2.1"], ["postgrex", "0.15.1"], ["crypto", "4.6"], ["public_key", "1.7"], ["approximate_histogram", ...], [...], ...], paas: "", server_time: "2019-10-02T16:21:11.969210Z", version: "1.9.1"}, source: "#PID<0.1250.0>", timestamp: "2019-10-02T16:21:11.969192Z"}}}}

We're running Erlang/OTP 22 which it looks like isn't in the travis matrix for this repo, maybe there was a change in the :gen_tcp API in 22?

mitchellhenke commented 4 years ago

I will take a peek at that.

Could you try the 1.0.1 version of the package?

mcasper commented 4 years ago

That's actually what I was running there, and I was able to remove the core_agent_triple setting and it detected the libc version perfectly 👍

mitchellhenke commented 4 years ago

@mcasper I found an issue with the reconnect logic

I believe I've fixed it here: #103

if you don't mind, could you try the master branch with that patch?

mcasper commented 4 years ago

@mitchellhenke Just gave it a go, and it did fix the FunctionClauseError!

Separately, I reread your earlier message and realized what the connectivity problem was, my application was binding to port 9000 first and blocking the core agent. I went ahead and set core_agent_tcp_port: 9001, and everything started working, hurray!

So while this definitely comes down a case of user error, should the default port for the core agent be something different than the default phoenix port?

mitchellhenke commented 4 years ago

That's all great to hear 🙂

Per https://hexdocs.pm/phoenix/up_and_running.html, Phoenix binds to 4000 by default. Is there a distribution or setup of Phoenix that defaults to 9000?

If it's a significant issue, we could for sure consider changing the default.

mcasper commented 4 years ago

Uhhhh yep, you're totally right, I don't know why I thought phoenix bound to 9000 by default. Ignore me!

mitchellhenke commented 4 years ago

No worries, thanks for the patience in working through this

mitchellhenke commented 4 years ago

This has been resolved I think 🙂