marcoCasamento / Hangfire.Redis.StackExchange

HangFire Redis storage based on original (and now unsupported) Hangfire.Redis but using lovely StackExchange.Redis client
Other
448 stars 108 forks source link

Latest release breaks Hangfire #153

Open soenneker opened 2 weeks ago

soenneker commented 2 weeks ago

Not sure if you guys have tried the new version, but we had to revert to 1.9.3 because it would not complete jobs.

Related to this work: https://github.com/marcoCasamento/Hangfire.Redis.StackExchange/issues/150

marcoCasamento commented 2 weeks ago

No, not tried the latest, but the merge contains nothing new. What does it means that it doesn't complete the jobs? Are jobs being executed but the status is not updated? Or are jobs not fetched?

soenneker commented 2 weeks ago

No, not tried the latest, but the merge contains nothing new.

I'm not sure I follow, it looks like a lot of changes from 1.9.3. I strongly suggest delisting the latest NuGet package!

What does it means that it doesn't complete the jobs?

I mean once the job starts processing, it doesn't finish it once all of the code has run. Additionally it looks like parallelization may be broken? Please, give it a try and you'll see what I'm talking about.

marcoCasamento commented 2 weeks ago

I'm not sure I follow, it looks like a lot of changes from 1.9.3. I strongly suggest delisting the latest NuGet package!

Sure, it contains changes from 1.9.5 and the two PR about job requeiung and changes for uniforming async use. But ok, unlisting it's the safest option here.

Please, give it a try and you'll see what I'm talking about.

I'll do for sure.

marcoCasamento commented 2 weeks ago

I can confirm that a job is being requeued after being succesfully executed, in an apparently infinite loop. Didn't spot anything about parallelization. I think that involves the code at commit e437998, that touch both RedisFetchedJobs and FetchedJobsWatcher. I'm going to try the code at commit immediately before and immediately after

marcoCasamento commented 2 weeks ago

It turns out that the problems arise with: JobStorageFeatures.Transaction.RemoveFromQueue(typeof(RedisFetchedJob)), true if RedisStorage returns true for that feature, the above problem arise. Unfortunately I don't know what that options means in Hangfire's logics.

@Lexy2 Can you shed a light on that flag ?

marcoCasamento commented 2 weeks ago

It seems that at least IWriteOnlyTransaction.RemoveFromQueue([NotNull] IFetchedJob fetchedJob) should be implemented See https://github.com/HangfireIO/Hangfire/blob/main/src/Hangfire.Core/Storage/JobStorageTransaction.cs

public virtual void RemoveFromQueue([NotNull] IFetchedJob fetchedJob)
{
    throw JobStorageFeatures.GetNotSupportedException(JobStorageFeatures.Transaction.RemoveFromQueue(fetchedJob.GetType()));
}
Lexy2 commented 2 weeks ago

I'll take a look. For now I think the best way is to delist the package, revert #152 and publish a version without this particular change.

Lexy2 commented 2 weeks ago

Yeah, it's completely busted. I wonder how this happened, I'll take a look tomorrow.

marcoCasamento commented 2 weeks ago

I believe Hangfire modifies its behavior if the storage indicates support for RemoveFromQueue. It seems to depend on IWriteOnlyTransaction.RemoveFromQueue to dequeue a job once it is completed.

I've just implemented this method with:

public override void RemoveFromQueue([NotNull] IFetchedJob fetchedJob)
{
    fetchedJob.RemoveFromQueue();
}

It appears to process the jobs correctly, but further tests and documentation are required for this function.

I plan to release a pre-release version, 1.10.1-Beta1, to facilitate testing.

marcoCasamento commented 2 weeks ago

I wonder how this happened, I'll take a look tomorrow.

I wonder the same, I was sure the code was already running before the lasts merges and this problem predates it.

As you're at it, would you please check my last two commits on RedisFetchedJob ? There was probably a copy-and-paste error on RemoveFromQueue that was actually Requeuing the job (if no UseTransactions), I believe it could be implied in this issue

Lexy2 commented 2 weeks ago

It appears to process the jobs correctly

The jobs stay fetched, so it's not yet complete :-(

marcoCasamento commented 2 weeks ago

True. It appears it depends because of line 63 of RedisFetchedJob, the condition returns false.

if (fetchedAt == FetchedAt) { RemoveFromFetchedListAsync(transaction); }

Do you remember why this check was added ?

Lexy2 commented 2 weeks ago

Yes, this check is actually a fix for #135. Hangfire orders the storage to remove the timed out job, and storage should only do this if it's the same job (and not its requeued copy).

Lexy2 commented 2 weeks ago

Wouldn't it be easier just to remove the capability from the storage so that Hangfire would use the other methods?

marcoCasamento commented 2 weeks ago

Wouldn't it be easier just to remove the capability from the storage so that Hangfire would use the other methods?

I don't know what the RemoveFromQueue is supposed to do, I was hoping you know more. Problem is, however, that even removing the feature, the job stay fetched as the check still returns false.

Lexy2 commented 2 weeks ago

Yuck. I'll try to find time to look into it. Still don't understand why it worked then and does not now.

Lexy2 commented 2 weeks ago

Okay, so I added this { JobStorageFeatures.Transaction.RemoveFromQueue(typeof(RedisFetchedJob)), true } to RedisStorage.cs, but did not properly test it. Like, this was a last minute thing.

As for jobs not being removed from dequeued state, this seems to be a recent change from Hangfire. I tested everything on 1.8.5, and it works (just if you remove this RemoveFromQueue transactional feature), but as soon as you upgrade to 1.8.12, it stops working.

Change in behaviour. I'm reviewing changes to Hangfire core to see if I can spot the issue.