himanshusinha2022 / FastApi-Demo

0 stars 1 forks source link

Container App configuration & its Metrics Deep dive #1

Open himanshusinha2022 opened 1 day ago

himanshusinha2022 commented 1 day ago

According to our last Infra dashboard discussion, we have some action items:

  1. Why is container CPU usage lower compared to the replica count?
    • This discrepancy occurs because the CPU usage graph is based on the average, while the replica count graph is set at a 2-hour interval.
    • In cases where CPU usage spikes last only 2-3 minutes, the average metric might not fully capture these fluctuations.
    • If we adjust the CPU usage graph to show the max instead of the avg and set the replica count graph to a 5-minute interval, the graphs correlate better, highlighting how spikes in CPU usage correspond with increases in replica count.

As we can see now the spikes in the container app CPU usage and Replica count graph actually correlated.

image

himanshusinha2022 commented 1 day ago

Why is there a gradual increase in container memory (RAM)? Over a 7-day period, memory usage is typically lower at the start of the week and gradually increases, peaking towards the end. The drop in usage often coincides with deployments, followed by a slow increase as usage resumes. We consistently average around 2 GB of memory usage (50%). 24hrs image

30 days image

himanshusinha2022 commented 1 day ago

Threshold for Container Apps

himanshusinha2022 commented 1 day ago

1. Deeper Analysis of Extra Memory Consumption

How will it improve optimal usage while ensuring no customer impact?

himanshusinha2022 commented 1 day ago

How the number of requests and replica counts relate to the scaling mechanism

Scaling Behavior and Replica Count Correlation with Total Requests

From the data presented in the Azure monitoring dashboard, we can observe the following key points regarding the scaling behavior and replica count in relation to the total number of API requests:

  1. Replica Count vs. Concurrent Requests:

    • The replica count (as seen in the "Replica" graph) is relatively stable at 4 replicas, despite processing a high volume of requests. This is primarily because Azure Container Apps auto-scaling is based on concurrent request thresholds set per replica, not the total number of API requests over a time period.
    • In this case, the concurrent request threshold is set to 10 concurrent requests per replica. This means that for any single replica to trigger scaling, it must handle more than 10 concurrent requests simultaneously.
  2. Total API Requests Distribution:

    • The total number of API requests graph shows fluctuations and peaks at various points. However, the key detail is that these requests are spread out over time, meaning they do not necessarily occur simultaneously.
    • For example, the dashboard indicates that around 2,000 requests were processed. Still, because these requests are distributed over time, the number of concurrent requests (requests happening at the same exact time) does not exceed the threshold for scaling to more replicas.
  3. Understanding Concurrent Requests:

    • The graph labeled "Total Request Units" and "API Gateway Traffic" shows spikes in activity. However, since the scaling mechanism only counts active, in-progress requests at any given time, these spikes may not result in additional replicas being created.
    • Even if 100 users are accessing the API, unless their requests occur at the same moment, the system will handle these with the current 4 replicas. This is why the replica count remains at 4, as the total number of concurrent requests doesn’t exceed the threshold for adding more replicas.
  4. Conclusion:

    • Even though the total number of requests is high, the replica count remains at 4 because the concurrent requests at any given moment do not exceed the set threshold of 10 per replica.
    • Azure's auto-scaling mechanism focuses on instantaneous load (concurrent requests) rather than the total number of requests. As a result, the system is optimized to scale only when needed, avoiding unnecessary resource usage.
himanshusinha2022 commented 1 day ago

Container App - Performance and OOM Issue

After checking the last 30 days:

image

image

Can We Handle 3X Load?