imranmomin / Hangfire.AzureCosmosDb

Azure Cosmos DB storage provider for Hangfire
https://www.hangfire.io/
MIT License
18 stars 17 forks source link

GetJobsOnState NullReferenceException #7

Closed ghost closed 4 years ago

ghost commented 4 years ago

Hi, I'm getting NullReferenceException on Hangfire.Azure.CosmosDbMonitoringApi.SucceededJobs method. It's happening when for a given job there is no state with specific job.StateId. image

If that's a valid issue I can help with PR. Thanks.

imranmomin commented 4 years ago

Can you find any state document for the job.

SELECT * FROM c WHERE c.type = 8 AND job_Id = ''

ghost commented 4 years ago

yes, there are 3 with names: "Processing", "Enqueued", "Scheduled". image

imranmomin commented 4 years ago

Can you check the job document and see which state it is in

SELECT * FROM c WHERE c.type = 2 AND id= ''

If the state is missing - then it looks like the reference state document did not save or it got deleted.

ghost commented 4 years ago

That's result image

imranmomin commented 4 years ago

Yeah, the job state is in "Succeeded" but there is no corresponding document for that state. Workaround - create the state document using the "State_Id"

Before creating the state document - just look for other succeeded state document for example

ghost commented 4 years ago

Ok, I see that workaround. But do you think that throwing NullReferenceException here is good decision? How about just ignore missing "states"?

imranmomin commented 4 years ago

Ignoring the states - I believe will create issues on UI. Wherever it is being displayed (you can give a try by handling it) Also, at the time of the development this library is the exact port from the Hangfire.SqlServer.

ghost commented 4 years ago

I think this scenario might happened when cosmosdb is throttling because of too many requests(429). Because unhandle exception our dashboard UseHangfireDashboard crash.

imranmomin commented 4 years ago

Error 429 is being handled and the retry is made based on the milliseconds returned with the error. I don't know why was the state document not created. For now your best option is to create the state document and monitor if the similar issue arises again