perrich / Hangfire.MemoryStorage

A memory storage for Hangfire.
Apache License 2.0
131 stars 43 forks source link

Multiple executions for a single job. #28

Closed dgioulakis closed 5 years ago

dgioulakis commented 5 years ago

Please see: https://github.com/HangfireIO/Hangfire/issues/1025#issuecomment-473707971

Doesn't seem memory storage is actually dequeuing the job or marking it as dequeued (I don't know what the correct behavior should be).

config.UseMemoryStorage(
    new MemoryStorageOptions
    {
        FetchNextJobTimeout = TimeSpan.FromSeconds(10)
    });
public class MyJobPerformer
{
    private readonly string _performerId;
    public MyJobPerformer()
    {
        _performerId = Guid.NewGuid().ToString("N");
    }

    public async Task Perform(RequestBase request)
    {
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
        await Task.Delay(TimeSpan.FromSeconds(5));
        Console.WriteLine($"{_performerId}: {DateTime.UtcNow}");
    }
}

Results

69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:20 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:25 PM
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:30 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:32 PM <<< Duplicate job execution! 10 seconds lapsed.
69b3501a7a284d0c88b030abde997810: 3/17/19 11:38:35 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:37 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:42 PM
6d0b3551c5354633b7d572da2cd5abc8: 3/17/19 11:38:47 PM

One job enqueue results in multiple worker executions. It's even worse if FetchNextJobTimeout is reduced further. I'm not sure whether this is a problem with Hangfire, MemoryStorage, or both.

dgioulakis commented 5 years ago

I believe the problematic or curious line is: https://github.com/perrich/Hangfire.MemoryStorage/blob/eed3fb6a0ebe9d3596a51f35a723a1cb7c0cd4e0/src/Hangfire.MemoryStorage/MemoryStorageConnection.cs#L111

|| q.FetchedAt.Value < timeout Can anyone explain why this provider is allowing already "fetched" jobs to be re-dequeued? As far as I can tell, FetchNextJobTimeout is essentially a window of time in which a job must complete before it gets selected again for execution.

Why should the storage connection fetch a job that's already been fetched? Or why are jobs not really dequeued here?

Update

https://github.com/HangfireIO/Hangfire/blob/3186f81549e068500192709764293d47b28317d6/src/Hangfire.Core/Server/Worker.cs#L140

Still digging through; looks like Hangfire only dequeues after the job has completed processing. So you may have jobs that continue processing longer than FetchNextJobTimeout causing them to be re-executed by an additional concurrent worker process. Something doesn't seem right here.

dgioulakis commented 5 years ago

Same problem referenced here: https://github.com/HangfireIO/Hangfire/issues/1197

perrich commented 5 years ago

Hi, I think it's by design to retrieve not completed job. You can define another timeout for instance 24 hours, by set FetchNextJobTimeout in the MemoryStorageOptions

praveenlobo7 commented 5 years ago

Hi, Do we have any solution for this? We are using Oracle provider and we see the same issue.

dgioulakis commented 5 years ago

Why would we want to retrieve a job that's currently being processed by a different Worker?

Update

Looks like this is partially the answer: https://github.com/HangfireIO/Hangfire/issues/936#issuecomment-315087104 Like InvisibilityTimeout and SlidingInvisibilityTimeout in the SQL Server storage implementation, FetchNextJob is really a window of time in which a job MUST complete or else may be started again by another Worker either concurrently, or upon next dequeue from the storage connection. I just don't understand why this can't be automated by Hangfire given the we can find out a job's state. Perhaps that's not guaranteed?

haga2112 commented 1 year ago

We no longer have the FetchNextJobTimeout property in version 1.7 (memory storage 1.4), I don't know what to do