imranmomin / Hangfire.AzureCosmosDb

Azure Cosmos DB storage provider for Hangfire
https://www.hangfire.io/
MIT License
18 stars 17 forks source link

Recuring Jobs Page crashing #43

Closed rdnusr closed 2 years ago

rdnusr commented 2 years ago

There seems to be a very weird issue happening randomly with recuring jobs. I have a few Recurring jobs (20+) created and they execute on a daily/weekly cron schedule depending on the specific job. There are 2 servers configured as load balancers and thus execute the same code on the same cosmosdb. Things work great initially but after a few days when I try to access the hangfire dashboard and specifically the Recurring jobs ("/hangfire/recurring") I am greeted with an error page instead of the list of the recuring jobs.

The recurring jobs page displays the following error An unhandled exception occured while processing the request.

AggregationException: One or more errors occured. (Request status code does not indicate success: NotFound (404); Substatus:0; ActivityId:806f1ad3-c232-4241-a.....; Reason: (code :NotFound) message: Entity with the specified id does not exist in the system. More info: https://aka.ms/cosmosdb-tsg-not-found RequestStartTime:xxx, RequestEndTime:xxx, Number of regions attempted: 1 CosmosException: Response status code does not indicate success: NotFound (404) ...

My cosmos database consistency level is set to session, data seems to be retained properly and when the actual error happens, that page stay inaccessible until all recuring jobs are deleted and recreated (doing this by code at this point) Any idea what might be going on?

imranmomin commented 2 years ago

Unable to reproduce the issue.