hibiken / asynq

Simple, reliable, and efficient distributed task queue in Go
MIT License
10.03k stars 716 forks source link

[BUG] handler did not run and this task hang out from then. #952

Closed cobain closed 2 weeks ago

cobain commented 3 weeks ago

Describe the bug task enqueued successfully but I found that the handler did not executed. once it happened, all the other task will not execute then. When I restart the process, all the pending task runs one by one.

To Reproduce Steps to reproduce the behavior (Code snippets if applicable):

  1. Setup background processing ...
  2. Enqueue tasks ...
  3. See Error ...

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

cobain commented 3 weeks ago

I use the default config. I don't know if it relates with this. When I have 10 kinds of task, it works well. But after increase to be 15, it suddenly did not work. After restart, it works. // Specify how many concurrent workers to use Concurrency: 10, // Optionally specify multiple queues with different priority. Queues: map[string]int{ "critical": 6, "default": 3, "low": 1, },

kamikazechaser commented 3 weeks ago

After restart, it works.

With this info only, it is very difficult where this issue is originating from. But very less likely from the library. Use the inspector or CLI to try and debug and provide more info.

cobain commented 3 weeks ago

yes, I know. it seems the worker died. when I kill the process, I don't now what to be triggered and the handler executed.

cobain commented 3 weeks ago

when I kill the process, it sends signals, then trigger the handler according to the doc. But I am not clearly about the worker.

Note: If you send TERM or INT signal without sending TSTP signal, the Server will start a timer for 8 seconds to allow for all workers to finish (To customize this timeout duration, use ShutdownTime config). If there are workers that didn't finish within that time frame, the task will be transitioned back to pending state and will be processed once the program restarts.

cobain commented 3 weeks ago

I found it works before 00:00, after 00:00, all the task will hang out. before this, it has worked well for half an year.

cobain commented 3 weeks ago

another clue it that it happened after I added several task type. before it, I have 10 tasks type. Now I have 14 tasks type

cobain commented 3 weeks ago

does it related with the config Concurrency: 10 ? I don't know if it works after add the concurrency to be big like 20.

kamikazechaser commented 2 weeks ago

Try v0.25.0 and report back

cobain commented 2 weeks ago

do you have any guess about the reason? I don't know why it happened after 00:00. do we have any code which executes clean or any tasks after 00:00?

cobain commented 2 weeks ago

ok. I will try it on test env firstly.

cobain commented 2 weeks ago

anyway, I have upgrade to be 0.25.0 and see if it would work tomorrow.

cobain commented 2 weeks ago

The bug still happened on 0.25.0. @kamikazechaser

cobain commented 2 weeks ago

@hibiken @appleboy @pior

cobain commented 2 weeks ago

does it relates with previous redis cache? now I need to a way to fix issue. I could not restart the service every night. 😂

pior commented 2 weeks ago

If you post a simplified app, it will be easier to fix this issue.

cobain commented 2 weeks ago

my app is the same with the demo code and it runs for 2 years. recently, I added several task. then it failed after 00:00. I have upgraded to 0.25.0 and it still doesn't work. Today I just set the log level to be debug. let me check the log tonight.

On Mon, Nov 4, 2024 at 5:23 PM Pior Bastida @.***> wrote:

If you post a simplified app, it will be easier to fix this issue.

— Reply to this email directly, view it on GitHub https://github.com/hibiken/asynq/issues/952#issuecomment-2454190478, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQQCNGTGI4I53E6KM6ZB63Z644JHAVCNFSM6AAAAABQ5IJ62GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJUGE4TANBXHA . You are receiving this because you modified the open/close state.Message ID: @.***>

cobain commented 2 weeks ago

update: it seems no useful debug logs currently. I have switched the redis db to a new one. and try again.

btw, I am investigating the source code. Hope i can fix it asap. @pior @hibiken @kamikazechaser @appleboy hope you guys can provide some help or more details.

kamikazechaser commented 2 weeks ago

There are a lot of reasons for why you are encountering this issue. It's very difficult to reproduce without some sample code and/or redis info. I'd suggest you look at existing issues around the scheduler and archiving of tasks.

cobain commented 2 weeks ago

My code and config are same with the demo provided. I just add 10+ task type and handler and registered. last night, when I reproduced this issue after 00:00, I use the web tool and see the status was pending.

cobain commented 2 weeks ago

I finally get the root cause. It is caused by some 3rd party script. It triggered a restart signal but not succeed. in the case, asynq got the signal and shutdown the server. so asynq would not work any more.