Closed paulbatum closed 6 months ago
@xt0rted I moved this from the webjobs sdk repo because this issue is really about the webjobs functionality that exists in kudu and not the webjobs SDK itself.
Have you experimented with approaches where your webjob would emit some type of heartbeat log to app insights, and you would write an alert against your app insights instance that alerts when the heartbeat is not present?
At the very least the portal should give me an error alert that the webjob is failing to start
I'm surprised about that part. If it fails to start, the Portal should not be telling you that the WebJob is running.
The status of the jobs in the portal was something like starting up
or restarting
. At first glance it seems like everything is fine because of the way it's worded. If it was failed to start
, error starting
, or error starting - retrying
then that'd be much more helpful.
What I was referring to with the portal error is something like the alerts that show in the top right when you login that say you have xx credits remaining
or when you save your app settings. I've received those a number of times saying api calls were being throttled (I think for deployments).
If app insights was used inside the webjob host (the process that discovers & runs them, not the JobHost
class) then that could log job failures, which would then show in the failures blade/failed requests list. I'm sure this could also be used to setup alerts, but I've yet to figure those out.
Yep, starting up
or restarting
is what I would expect. Just wanted to confirm it didn't say running
.
But other than that, yes, I agree that the lack of alerting is not ideal. I don't think there is a great solution right now.
Since you are using AI you can do something like this in the meantime:
public class Functions
{
public static async Task MyWebJobAsync([TimerTrigger("0 0 2 * * *", RunOnStartup = true)] TimerInfo timer = null)
{
using (var webJob = new TrackWebJob())
{
try
{
// Do something
}
catch (Exception e)
{
webJob.Failed(e);
}
}
}
}
public class TrackWebJob : IDisposable
{
private readonly TelemetryClient telemetry = new TelemetryClient(); // Should be reused!!!
private readonly IOperationHolder<DependencyTelemetry> operation;
public TrackWebJob([CallerMemberName] string name = null)
{
var dependencyTelemetry = new DependencyTelemetry
{
Name = name,
Type = "WebJob",
};
operation = telemetry.StartOperation(dependencyTelemetry);
}
public void Failed(Exception e)
{
operation.Telemetry.Success = false;
// Log exception here
}
public void Dispose()
{
operation.Dispose();
}
}
Now you can set alerts for failed calls for dependency type WebJob
.
Disclaimer: I tried to simplify what I actually use in production so it's not tested and might need some extra work but it's enough to show what I'm aiming for. Also note that the dashboard will show all runs as success
unless you throw the exception again.
Is there an Azure REST api call that will let us query the status of the webjob? Some api here https://docs.microsoft.com/en-us/rest/api/appservice/webapps/listwebjobs but which one would give the status?
Maybe this https://docs.microsoft.com/en-us/rest/api/appservice/webapps/listwebjobs#code-try-0
Hi
If the problem persists and is related to running it on Azure App Service, please open a support incident in Azure: https://learn.microsoft.com/en-us/azure/azure-portal/supportability/how-to-create-azure-support-request
This way we can better track and assist you on this case
Thanks,
Joaquin Vano Azure App Service
From @xt0rted on June 8, 2018 23:16
If you deploy a webjob and it fails to start for any reason, including one out of your control https://github.com/aspnet/websdk/issues/347, there should be a way to get alerted of this. My site has the AI extension installed, the site & webjobs use AI & Raygun for error monitoring, but none of this picks up issues when the webjob host fails to run the job. I've run into this once before (#1619) and it's incredibly frustrating to find out hours or days later instead of immediately.
I'd love to see these issues get surfaced into AI, as well as some way to tell a 3rd party system such as Slack, Raygun, or Bugsnag that there was a problem.
Repro steps
run.cmd
that containsdotnet WebJob.exe
Expected behavior
At the very least the portal should give me an error alert that the webjob is failing to start
Actual behavior
Nothing happens, everything continues on as normal
Known workarounds
None
Related information
Copied from original issue: Azure/azure-webjobs-sdk#1742