Open f1nzer opened 2 years ago
When a job is dequeued it updates the fetched_at
with current utc. The document is only removed if the job completes and the method RemoveFromQueue
is invoked.
My guess is that after the job completed it mostly likely failed with other housekeeping tasks were called. i.e update state, counters and so on.
If you can provide logs we can surely look further into it
Unfortunately, there are no Hangfire related warnings/errors.
I have enabled additional logging to catch such problems in future.
My guess is that after the job completed it mostly likely failed with other housekeeping tasks were called. i.e update state, counters and so on.
Most likely the job was stored (+ state), but a Queue
entity was not created. Probably, because it was scheduled in CosmosDbWriteOnlyTransaction
but then due to app crash it was not executed (committed).
I think the only thing I can do there (at least in my bad environment) is to check for those "hung" jobs on app startup and then manually create Queue
entities for them, but there is no queue name in those jobs to do that.
In an unstable environment where an application may crash or restart due to some external issue, there may be a case where some jobs may hang and never be moved to the processing state.
In my case there are 6 jobs that are in Enqueued state, but I can't see them via the dashboard (only count is displayed).
Looks like an item with type
DocumentTypes.Queue
was fetched using aJobQueue
class and then the application crashed or something like that. There is data from CosmosDb related to the document: