Feature Request: Metrics in ACA Jobs

electroma commented 11 months ago

Is your feature request related to a problem? Please describe.
Regular ACAs have a good set of built-in metrics: replica count, CPU and memory utilization. I can't find anything like that for ACA Jobs.

Describe the solution you'd like.
Make the following ACA Job metrics available:

Number of Parallel Job Executions running (similar to Replica Count in regular ACA)
CPU allocated and utilized (similar to regular ACA)
Memory allocated and utilized
Error ratio (container crashes and such)

Describe alternatives you've considered.
I do not see any alternative, and I believe metrics should be published, so the teams can plugin proper monitoring.

vinisoto commented 11 months ago

Yes. We are aware of this gap. We expect having these metrics start to flow by Jan 2024. We will update this issue when we have a better ETA.

ractando commented 9 months ago

Hey @vinisoto. Is there any ETA on this ?

kumarmo-2 commented 9 months ago

@vinisoto , Any update on when we can expect this feature to be in GA ?

vd84 commented 9 months ago

@vinisoto Any update? :D

tmakowka-tc commented 9 months ago

@vinisoto We are also looking forward to this feature very much since we would like to gain more insight into our ACA jobs.

dsczltch commented 8 months ago

Hey @vinisoto. We are also looking forward to this feature very much since we would like to gain more insight into our ACA jobs.

dinoo commented 8 months ago

@vinisoto is it possible to give us an ETA?

vinisoto commented 8 months ago

hi sorry for the lack of updates. We hit some delays on our release. We are targetting for these metrics to be available by the week of 4/22 or sooner.

Shalin-AngloAmerican commented 8 months ago

Waiting for this feature please

itallackpure commented 6 months ago

Is there an update on progress with this?

AdrianProkop commented 6 months ago

Any update on this?

mkpnitorenergy commented 5 months ago

ETA?

pietersap commented 5 months ago

also waiting

snavarropino commented 5 months ago

Any news on this?

jparta commented 5 months ago

This is a vital enabler for the automation of failure monitoring. The offering seems incomplete without metrics.

k3-yasuda commented 5 months ago

I'm waiting for updates on this feature.

nbwdk commented 4 months ago

Insights to metrics are vital for us, in order to do right sizing and consolidations of Dedicated Plans.

maverickmetro commented 4 months ago

I see the Otel integration is in preview, will Otel integration get the System metrics for ACA Jobs?

vinisoto commented 4 months ago

hi - the metrics UX has been enabled for Jobs:

we currently support:

Number of Job Executions
CPU Usage
Memory Usage

cc: @anthonychu

electroma commented 4 months ago

Thank you for the update @vinisoto. In the original request I have requested one more important metric - Error Ratio. Would it be possible to compute this metric by splitting "Job Executions" on State?

rbange commented 4 months ago

@vinisoto when can we except this to be rolled out for all jobs? At least for my subscription its not present yet.

anthonychu commented 4 months ago

@rbange The metrics blade should be available everywhere now and should appear as in the screenshot that @vinisoto shared. Could you please check again?

dinoo commented 4 months ago

@anthonychu when I check our Job metrics, all 86 jobs report a stable 0.05nc CPU usage and 536.9 MB on average, min and max aggregation.

Doesn't seem to be right, I wouldn't expect a flatline and the same values for all jobs.

rbange commented 4 months ago

@anthonychu Yes the metrics appeared approx 3 days later. However interestingly they are not selectable when trying to access them via scope in the metrics section. There only the regular resources appear, so you basically have to manually go to each job to check them...

Also I can report the same issue as @dinoo. Ram is always maxed out even though they require far less locally and CPU Usage is stuck at around 0.0x nc in min/max and average aggregation. I have a job which runs approx 20 minutes at each full hour and a flat line is absolutely unrealistic.

itallackpure commented 4 months ago

@anthonychu when I check our Job metrics, all 86 jobs report a stable 0.05nc CPU usage and 536.9 MB on average, min and max aggregation.

Doesn't seem to be right, I wouldn't expect a flatline and the same values for all jobs.

I am seeing the exact same numbers as @dinoo - 0.05nc and 536.9MB average. Something seems wrong...

anthonychu commented 3 months ago

Thanks all for reporting. We'll investigate.

dani2221 commented 3 months ago

Can you clarify what does the metric - Number of job executions mean? I thought it just counted every time the job is started, but as I can see I have more than 7 million executions in the past 30 days on a event driven job that approx. runs 30-50 times a month.

vinisoto commented 3 months ago

Hi, thanks for reporting. Some container apps jobs are not displaying the correct runtime value of the CPU and Memory metrics. We are preparing a configuration change and will be rolling it out in the next few days.

Regarding Job Executions: This metric displays the current number of job executions in an environment (Consistent with the execution list displayed in the Execution History blade). When a job execution ends, it remains in the environment (currently the last 100 successful and 100 failed executions are kept around). One way to visualize executions in time is to apply a split by Execution Name:

We are working on two new metrics: Executions Started and Executions Ended to display how many jobs executions start/end at a point in time. Will communicate here once we have a timeline to share.

rodyvansambeek commented 1 month ago

Regarding Job Executions: This metric displays the current number of job executions in an environment (Consistent with the execution list displayed in the Execution History blade). When a job execution ends, it remains in the environment (currently the last 100 successful and 100 failed executions are kept around). One way to visualize executions in time is to apply a split by Execution Name:

This is not working on my Container App Jobs. It runs every hour and I definitely see the runs in the Execution History blade, but nothing in the Metrics section:

xdawxd commented 1 month ago

Are there any plans to make metrics accessible outside the "Container App Job | Metrics" tab? I can access them through the ACA Job, but not through Metrics or Dashboard Hub.
While selecting the scope I can't see any Container App Jobs even though they are in that Resource Group. [edit] Container App Jobs Metrics also can't be selected within "Monitor | Alerts" which would be a great thing to have.

vinisoto commented 1 month ago

@rodyvansambeek - there was a regression that caused Jobs metrics to stop showing for some customers. We are rolling out a fix which will be fully deployed to all regions by end of this week.

vinisoto commented 1 month ago

@xdawxd - we are in the process of fixing both issues: Jobs being available outside of the Job Metrics blade (for example: making Jobs available as a Metrics scope) and being able to create alerts based on Jobs metrics.

We will update here when we have ETAs for both fixes.

microsoft / azure-container-apps

Feature Request: Metrics in ACA Jobs #1027