Container App configuration & its Metrics Deep dive

himanshusinha2022 commented 1 day ago

According to our last Infra dashboard discussion, we have some action items:

Why is container CPU usage lower compared to the replica count?
- This discrepancy occurs because the CPU usage graph is based on the average, while the replica count graph is set at a 2-hour interval.
- In cases where CPU usage spikes last only 2-3 minutes, the average metric might not fully capture these fluctuations.
- If we adjust the CPU usage graph to show the max instead of the avg and set the replica count graph to a 5-minute interval, the graphs correlate better, highlighting how spikes in CPU usage correspond with increases in replica count.

As we can see now the spikes in the container app CPU usage and Replica count graph actually correlated.

himanshusinha2022 commented 1 day ago

Why is there a gradual increase in container memory (RAM)? Over a 7-day period, memory usage is typically lower at the start of the week and gradually increases, peaking towards the end. The drop in usage often coincides with deployments, followed by a slow increase as usage resumes. We consistently average around 2 GB of memory usage (50%). 24hrs

30 days

himanshusinha2022 commented 1 day ago

Threshold for Container Apps

HTTP Scaling Trigger:
- Scaling is based on the number of concurrent requests set during container app creation, not CPU or memory usage.
- Since we didn’t configure any custom scaling rules, the default is HTTP scaling. This information was confirmed by Azure support, as it’s not documented in Azure's official documentation.
- Example: With 2 vCPUs, 4 GB RAM, and 5 replicas, the billing will be for 5 x (2 vCPUs + 4 GB RAM).
Concurrent Requests in Container Apps:
- Default Settings:
- According to Azure support, the default concurrent request limit is 10, with HTTP scaling as the default rule.
- Scaling Mechanism:
- Azure container apps use KEDA for K8s event-driven horizontal scaling.
- You can control scaling via the resources.properties.template.scale section in the JSON configuration file.
- For HTTP traffic, replicas scale based on the number of concurrent requests. If concurrentRequests increases, an additional replica is added instantly, not based on a time window.
- Example: If the concurrentRequests property is set to 100 and there are 1000 concurrent requests, 10 replicas will be deployed.

himanshusinha2022 commented 1 day ago

1. Deeper Analysis of Extra Memory Consumption

Memory Spikes: The memory usage graph shows periodic spikes where memory consumption sharply increases. These spikes could be due to specific workloads (such as batch processing, data caching, or API calls requiring large data payloads).
Concurrent User Load: Higher numbers of concurrent requests can also contribute to memory pressure, as more simultaneous connections and request handling threads consume more memory.

How will it improve optimal usage while ensuring no customer impact?

Avoid service disruption: Memory and CPU overloads can cause slowdowns or crashes. By scaling replicas before reaching critical memory or CPU thresholds, we can maintain performance and ensure that customer-facing services remain responsive.
Cost-efficiency: This scaling strategy ensures that you only scale out when needed, preventing unnecessary replicas from being added, which can drive up costs. We have HTTP scaling mechanism with concurrent Request property set for 10 replicas per second

himanshusinha2022 commented 1 day ago

How the number of requests and replica counts relate to the scaling mechanism

Scaling Behavior and Replica Count Correlation with Total Requests

From the data presented in the Azure monitoring dashboard, we can observe the following key points regarding the scaling behavior and replica count in relation to the total number of API requests:

Replica Count vs. Concurrent Requests:
- The replica count (as seen in the "Replica" graph) is relatively stable at 4 replicas, despite processing a high volume of requests. This is primarily because Azure Container Apps auto-scaling is based on concurrent request thresholds set per replica, not the total number of API requests over a time period.
- In this case, the concurrent request threshold is set to 10 concurrent requests per replica. This means that for any single replica to trigger scaling, it must handle more than 10 concurrent requests simultaneously.
Total API Requests Distribution:
- The total number of API requests graph shows fluctuations and peaks at various points. However, the key detail is that these requests are spread out over time, meaning they do not necessarily occur simultaneously.
- For example, the dashboard indicates that around 2,000 requests were processed. Still, because these requests are distributed over time, the number of concurrent requests (requests happening at the same exact time) does not exceed the threshold for scaling to more replicas.
Understanding Concurrent Requests:
- The graph labeled "Total Request Units" and "API Gateway Traffic" shows spikes in activity. However, since the scaling mechanism only counts active, in-progress requests at any given time, these spikes may not result in additional replicas being created.
- Even if 100 users are accessing the API, unless their requests occur at the same moment, the system will handle these with the current 4 replicas. This is why the replica count remains at 4, as the total number of concurrent requests doesn’t exceed the threshold for adding more replicas.
Conclusion:
- Even though the total number of requests is high, the replica count remains at 4 because the concurrent requests at any given moment do not exceed the set threshold of 10 per replica.
- Azure's auto-scaling mechanism focuses on instantaneous load (concurrent requests) rather than the total number of requests. As a result, the system is optimized to scale only when needed, avoiding unnecessary resource usage.

himanshusinha2022 commented 1 day ago

Container App - Performance and OOM Issue

After checking the last 30 days:

CPU Usage:
- Max Used: 1.24 cores
- Max Available: 2 cores
  The CPU usage is within limits, leaving some room for more load.

Memory Usage:
- Max Used: 2.7 GB
  While no major spike was recorded when the PDF action failed, it’s likely that a brief memory surge occurred, especially with tasks like uploading 72 images. This could have caused an OOMKill (SIGKILL, code 137), meaning the container ran out of memory unexpectedly (Not Exact root cause)
- Solution: To prevent this, it's best to move the PDF generation task to a separate service. This will ensure memory is better managed and the container doesn't get overloaded.

Can We Handle 3X Load?

Container App: The CPU usage is fine, but memory is a concern. To handle 3X traffic, memory should be increased to at least 8 GB, or PDF processing moved to a different service to avoid OOM issues during high load.

himanshusinha2022 / FastApi-Demo