microsoft / azure-container-apps

Roadmap and issues for Azure Container Apps
MIT License
360 stars 29 forks source link

Measuring idle / active #843

Open rodyvansambeek opened 1 year ago

rodyvansambeek commented 1 year ago

This issue is a

Issue description

As far as I know, there is no known way to see if a container app is in idle state or in active state. It's pretty difficult to predict costs this way. We have several container apps running which use the Azure Storage Queue Input Binding to run when a message is added to the queue. Unfortunately, these apps all are being billed as active, however they are only performing some work a few hours per month.

The ACA pricing page shows A replica is active when vCPU usage is above 0.01 cores or when data received is above 1,000 bytes per second. so this looks like it's actually being measured per second.

But using the metrics tab we can only show metrics on a granularity of at minimum 1 minute, so we cannot see exactly the metrics per second, or know if the current state is idle or active. Besides that, it is unclear which metric we have to look at.

Example

Looking at the metrics in our example, I would assume the 2 metrics to use are:

  1. CPU Usage (Average)
  2. Network IN bytes (Sum)

For a granularity of 1 minute, it shows the following:

image

Can I draw the conclusion that this means it is active because of network IN bytes per minute is 85,9KB, so in theory it can be higher than 1KB per second?

Maybe add a metric if the replica is in active or idle state?

Cost management

Besides that 'cost management' is able to show some info about Idle and Active usage, but this does not show the revisions, so it's pretty difficult to analyse using that interface.

Why is there network traffic

Final question, because the revisions are not doing anything for days, how come the network traffic is over the limit? I assume this has to do with the Azure Storage Queue Input Binding that is running, and polls every 10 seconds. I tried to update the pollingInterval of the Dapr component to "60s", but this had no effect whatsoever.

pollaktamas commented 9 months ago

@rodyvansambeek Did you find an explanation for the constant network traffic? Because of that even if the observability part of this issue were solved, we are still charged for active usage price.

rodyvansambeek commented 9 months ago

@pollaktamas no unfortunately not. What I actually did for this case was that I used the scale rule based on the Azure Storage Queue size, and scaled back to 0 when the queue is empty. This works in my case, because the container app only spins up a few times a day when there are new messages in the queue.

I noticed that on other scenarios where I just have API's inside the apps without any Dapr input bindings, the network usage is perfectly below the idle threshold. I suspect it is the polling that the Azure Storage binding does to check for new messages.

syky27 commented 9 months ago

I have exactly the same problem I have multiple apps that contantly run on 14KB, what amazes me that I have WASM Blazor app, which basically does nothing until it receives a request, and it still has constant bytes in just about 14KB :-/ I guess it will force me to use tcpdump and wireshark...