newrelic / newrelic-ruby-agent

New Relic RPM Ruby Agent
https://docs.newrelic.com/docs/apm/agents/ruby-agent/getting-started/introduction-new-relic-ruby/
Apache License 2.0
1.2k stars 598 forks source link

Hypothesis: `Agent#transmit_data_types` could be replaced with `Agent#transmit_data` #2505

Open fallwith opened 6 months ago

fallwith commented 6 months ago

transmit.rb defines a transmit_data_types helper method which is designed to forcibly flush all data out immediately instead of as part of the normal harvest cycle.

This method is called when the agent shuts down, and it is also called by the flush_pipe_data method used by the Resque instrumentation.

Looking at just the error data type for example, transmit_data_types invokes transmit_data and then invokes transmit_error_event_data. The transmit_error_event_data invocation appears to be completely redundant based on what transmit_data already accomplished, as it ultimately ends up calling harvest_and_send_from_container a second time.

transmit_data -> 
  harvest_and_send_data_types ->
    harvest_and_send_errors ->
      harvest_and_send_from_container

transmit_error_event_data ->
  transmit_single_data_type ->
    harvest_and_send_from_container

Thankfully, the initial call to harvest_and_send_from_container should empty out each harvester's bucket and the second call should see a sample size of zero and short circuit. So fortunately no redundant calls should result in problematic data being reported to a New Relic collector.

My hypothesis is that we can simply remove the transmit_data_types method entirely and replace calls to it with calls to transmit_data defined in the same Agent class.

This issue has been filed to call on that hypothesis to be tested with a suitable code audit and series of tests. If the hypothesis is true, then we should proceed with the refactor. If the hypothesis is proved false because of some race condition or edge case that relies on the redundancy existing, we should either refactor transmit_data to be more tolerant accordingly, or at least add some code comments that explain the need for the redundancy if we decide to leave things as-is.

workato-integration[bot] commented 6 months ago

https://new-relic.atlassian.net/browse/NR-248418