marcoCasamento / Hangfire.Redis.StackExchange

HangFire Redis storage based on original (and now unsupported) Hangfire.Redis but using lovely StackExchange.Redis client
Other
452 stars 108 forks source link

No job state expiration time - Jobs can be deleted before state is retrieved #76

Closed jonstelly closed 6 years ago

jonstelly commented 6 years ago

I am seeing jobs deleted from Redis before I am able to check on their state using the JobManagerApi. I do not see the same behavior using SQL storage and I think I've tracked down the issue.

I have some code that attempts to enqueue a handful of jobs to Hangfire, and later I use the JobMonitorApi to get the state of these jobs to ensure they succeeded or failed.

What I've found is that if the ExpiredJobWatcher runs between the time the job finishes, and the time I check the status, the job will get deleted.

What I see in the SQL store is in the SqlServerConnection class, expiredIn is added to createdAt and stored on the Job table, and ExpirationManager will only delete jobs that have passed that expiry time.

The value passed in by Hangfire seems to be 30 days.

I don't see any option or quick fix I could use here. I can increase the ExpiryCheckInterval to reduce the likelihood of this happening, but I think that's going to cause memory issues for me.

marcoCasamento commented 6 years ago

The purpose of ExpiredJobWatcher is to keep in sync the lists of jobs in final states (Succeeded, Deleted) with their effective existance in Job namespace. Jobs are normally deleted by redis using TTL, and in order to keep them in sync I've got a couple of options:

the first one could lead to jobs occasionally kept in the list though they expired, and this is expected since the pub-sub of redis is not meant to support reliable notification So I've to use the second one. Now you're saying that if you issue a "JobDetails(jobid)" to get a Job before it expire, the Job itself get deleted ?

I'm sure I have misunderstood something, this is the same method that get called everytime you display a job in the dashboard. How do you check the status of the Job ?

jonstelly commented 6 years ago

The JobApi call we're using is: var details = JobStorage.Current.GetMonitoringApi().JobDetails(jobId); and it doesn't cause the job to be deleted. The issue is that it's possible that if a job completes at 1:59:59, the ExpiredJobWatcher runs at 2:00:00, then I try to get job details at 2:00:01 but the job will have already been deleted by the ExpiredJobWatcher.

The logic in SQL Storage won't delete any job less than 30 days old (the expireIn value passed to the storage engine by hangfire) so that's a pretty big window to go check on the status of the job. But with Redis Storage there's no window like that and it's possible that a final job state is deleted the second it is saved.

I don't know that the storage API really calls out the expected behavior for storage engines here, but since there is a Job Monitoring API that allows you to get the job details, I would expect there to be some sort of reasonable grace period between the time a job finishes, and the time when it is deleted.

For now, I'm going to write a custom filter that stores job results outside of the Hangfire namespace and I'll just check those directly from Redis. But if there was some addition maybe to RedisStorageOptions to allow us to specify how long we want/need the final job state to be kept before it's deleted, that would solve my issue.

marcoCasamento commented 6 years ago

Oh ok, got it, but it's not how it works. The jobs get deleted directly by Redis accordingly to its JobExpirationTimeout (default 1 day, but you can extend it to says, 30day or even a year). The "ExpiredJobWatcher" only clean up the Succeeded/Deleted list of jobid, it doesn't touch the job related keys.

You could find this thread useful: https://discuss.hangfire.io/t/how-to-configure-the-retention-time-of-job/34

So, if you need to keep your jobs for 30 days, simpply add these lines to your code:

public class ProlongExpirationTimeAttribute : JobFilterAttribute, IApplyStateFilter
{
    public void OnStateApplied(ApplyStateContext context, IWriteOnlyTransaction transaction)
    {
        filterContext.JobExpirationTimeout = TimeSpan.FromDays(30);
    }
}

and in the startup: GlobalJobFilters.Filters.Add(new ProlongExpirationTimeAttribute());

Hope this helps

jonstelly commented 6 years ago

Interesting. Thanks for taking the time to explain that. I saw a few cases where getting the job detail failed and when looking at Redis the job key (and all of the history) was no longer there. Re-reading the code for ExpiredJobWatcher I see what you're saying, that it's just cleaning up the succeeded/deleted lists, so I guess I was incorrect about the cause.

I'm going to close this issue and go debug what's going on a bit more, I'll reopen if needed. Thanks.