newrelic / newrelic-php-agent

The New Relic PHP Agent
https://opensource.newrelic.com/projects/newrelic/newrelic-php-agent
Apache License 2.0
120 stars 63 forks source link

Performance Degradation Introduced in New Relic PHP Agent v10.13.0.2 #806

Open theophileds opened 10 months ago

theophileds commented 10 months ago

Description

A significant increase in CPU usage, latency, and fluctuating php-fpm processes occurred after upgrading the New Relic PHP agent from version 10.0.0.312 to version 10.13.0.2. Despite attempting to downgrade New Relic, compatibility issues arose with PHP 8.2, leading to agent disablement and subsequent performance improvement.

Hypothesis: Hypervisor Clock Settings

Upon contacting New Relic support, a potential connection to hypervisor clock settings was suggested. Despite transitioning to TSC (Timestamp Counter) for clock configuration, benchmark results displayed a marginal improvement in average duration.

This benchmark was executed with 100,000,000 iterations, repeated a hundred times on two different containers running on machines set with TSC and kvm-clock configurations.

for (int i = 0; i < iterations; i++) {
    gettimeofday(&end, NULL);
}

Benchmark Results:

TSC-based Configuration: Average Duration 2.321919 seconds kvm-clock-based Configuration: Average Duration 2.817715 seconds The observed result indicated a 17.56% decrease in average time when using TSC.

However, we acknowledge that our benchmarking approach may not accurately mirror the load pattern experienced by the New Relic agent. Moreover, despite conducting tests using TSC, we did not observe any noteworthy improvement in performance.

Feature Disabling and Version Testing

To pinpoint the source of the issue, extensive testing was conducted, including the disabling of features such as distributed tracing, code-level metrics, and application logging. The performance impact persisted across multiple tests and versions.

newrelic.distributed_tracing_enabled = false
newrelic.code_level_metrics.enabled = false
newrelic.application_logging.enabled = false
newrelic.custom_events.max_samples_stored = 10000

newrelic.daemon.dont_launch = 3
newrelic.daemon.utilization.detect_aws = false
newrelic.daemon.utilization.detect_azure = false
newrelic.daemon.utilization.detect_gcp = false
newrelic.daemon.utilization.detect_pcf = false
newrelic.daemon.utilization.detect_docker = false
newrelic.daemon.app_timeout = "2m"
newrelic.browser_monitoring.auto_instrument = false
newrelic.framework = "symfony4"

newrelic.error_collector.enabled = false
newrelic.transaction_tracer.enabled = false
newrelic.transaction_tracer.detail = 0
newrelic.transaction_tracer.slow_sql = false
newrelic.transaction_events.enabled = false
newrelic.attributes.enabled = false
newrelic.custom_insights_events.enabled = false
newrelic.synthetics.enabled = false
newrelic.datastore_tracer.instance_reporting.enabled = false
newrelic.datastore_tracer.database_name_reporting.enabled = false
newrelic.application_logging.forwarding.enabled = false

Regrettably, these efforts did not result in any substantial improvement. After repeating the experiment multiple times, it became evident that enabling New Relic consistently led to a significant negative impact on performance. This observation persisted across various versions of the New Relic agent, including:

PHP-fpm Processes and CPU Usage

As illustrated in the Grafana metrics screen captures, the tests were conducted in the following sequence with the specified configurations:

  1. New Relic fully disabled
  2. New Relic enabled (All features disabled) with TSC clock
  3. New Relic enabled (All features disabled) with kvm-clock configuration
Screenshot 2023-12-20 at 2 40 40 PM

Conclusion

The bump to version 10.13.0.2 introduced significant performance degradation, challenging explanations based solely on new features or clock system changes. The issue persists despite clock configuration adjustments and feature disabling.

Your Environment

PHP backend applications built on Symfony, Docker image php:8.2.13-fpm Deployed on EKS 1.24, EC2 instance type: m5.xlarge (Hypervisor Nitro) Clock configuration tested with TSC and kvm-clock

theophileds commented 10 months ago

Additional Experiment with Version 10.15.0.4

Further experiments were conducted with New Relic agent version 10.15.0.4 (under the same newrelic.ini configuration), both enabled and disabled. Unfortunately, no significant improvement was observed in performance.

Screenshot 2023-12-26 at 2 51 54 PM Screenshot 2023-12-26 at 2 52 11 PM Screenshot 2023-12-26 at 2 52 29 PM

In terms of memory consumption, we observed an increase of approximately 70 MB per pod when the New Relic agent is enabled, resulting in an average of approximately 375 MB per pod. In comparison, when the agent is disabled, the memory usage averages around 305 MB per pod.

dorain47 commented 10 months ago

@theophileds agree with your observation :100:
CPU spike has reduced (slightly) for me since 10.15.0.4 but for the memory part, I have been facing higher memory usage since last few newrelic agent releases.

theophileds commented 9 months ago

Hello,

I have some exciting updates to share with you.

Firstly, we conducted performance tests using the latest version of the New Relic agent, v10.16.0.5, and observed a modest ~5% reduction in CPU overhead.

Additionally, after thorough performance testing, we noticed a significant efficiency improvement by transitioning to Amazon EC2 C7a instances. These instances utilize AMD processors, surpassing their Intel chip generation counterparts in performance.

Our comparison involved several machines, including c7a.xlarge (AMD), c7i.xlarge (Intel), c5.xlarge (Intel), and our current m5.xlarge. Attached are screenshots depicting the results.

Screenshot 2024-01-24 at 7 42 50 PM

The c7a.xlarge emerged as the top performer, demonstrating significant performance improvements. Although some of this enhancement can be linked to the higher frequencies of AMD processors, the noteworthy performance variability, particularly given that c7a.xlarge instances still use kvm-clock while utilizing AMD chips that provide 3.7 GHz per core compared to the 3.5 GHz performance per core of c5.xlarge instances, hints at the potential influence of AMD's architecture and cache structure on these outcomes.

Winfle commented 1 month ago

@theophileds I think, main difference between current Intel and c7a instances is in SMT- vCPU is locked not on cpu thread but on actual CPU core. So you have more real cores especially on heavy tasks, that involve high CPU usage