newrelic / elixir_agent

New Relic's Open Source Elixir Agent
https://hex.pm/packages/new_relic_agent
Apache License 2.0
254 stars 96 forks source link

Sample processing can fail when sampling data #367

Open zoevkay opened 2 years ago

zoevkay commented 2 years ago

Hello! 👋

Describe the bug

We got two New Relic errors with message (ArithmeticError) bad argument in arithmetic expression. The stack trace pointed to https://github.com/newrelic/elixir_agent/blob/master/lib/new_relic/sampler/process.ex#L79

It looks like the agent does not handle a nil return case when fetching info about the process if the process is no longer alive. According to erlang docs, "Returns undefined [nil in Process.info/2] if the process is not alive."

I haven't found details on why the process wasn't alive. Both error occurrences were a single error on different days.

Environment

binaryseed commented 2 years ago

The GenServer tries to handle this by putting a monitor on the process and handling the case when the process goes down: https://github.com/newrelic/elixir_agent/blob/master/lib/new_relic/sampler/process.ex#L34

That said there's probably some kind of race condition here, maybe the process dies before the first sample is even taken.. Should be possible to handle the nil case with a little refactor. PR welcome :)