Open cyberbudy opened 2 years ago
Hello! Can you please describe how many instances are running and which command you use to start workers? It would also be great if you could share your Darq args&kwargs (app = Darq (.....))
Maybe there is some kind of limitation when too many records in queue
We use Darq in a big project, there're a lot of workers & thousands of completed tasks by day, so I don't think that's a problem
You say that queue is stucking. By default worker processes 10 tasks asynchronously (max_jobs param). But, for example, if one of your task blocks the event loop - it will block the entire worker. In this case even timeout will not work.
But there are no "ongoing tasks" in your log, so the problem is probably in something else.
You are sure that queued tasks are not "defered" (waiting for specified time to start)? You add tasks to the queue with .delay()
, not .apply_async(defer_by=...)
, right?
Also there is a small chance that there are some bug in aioredis with Redis 6.x. Because aioredis 1.3.x was not tested with Redis 6.x.
I'm running 3 darq instances with 5 to 10 thousands small tasks per day(io bound communication). I have 3 tasks.
This is my darq setup
darq = darq.Darq(
redis_settings=darq.RedisSettings(
host=settings.REDIS_HOST,
port=settings.REDIS_PORT,
database=settings.REDIS_DB,
),
on_startup=startup,
on_shutdown=shutdown,
keep_result=0,
max_jobs=100,
queue_name='queue',
job_timeout=3600
)
# Task example
@darq.task(queue='queue')
async def send_message_task(message_id: str):
pass
# And is called always as
await send_message_task.delay(message['id'])
Yes, I thought that it may be a blocking issue, but I guess there should be at least a log record after restart about a new message. Alose because redis and darq instances when "stuck" uses the same amount of resources, it must be not that case.
I see, aioredis released a new 2.0 version with some backwards incompatible api. Are there any plans of upgrading to a new version?
https://github.com/samuelcolvin/arq/pull/258
Newest version of arq supports aioredis 2.0, by using redis-py :)
I'm sorry to be late with the reply.
Yes, Darq will have support for the new redis-py. Until recently, in my personal opinion, aioredis 2.x was not yet production-ready.
Speaking about your issue: are you still facing the problem? It seems that your tasks are not blocking the loop, because health check is working, and "ongoing" tasks = 0.
After your worker will stuck can you check some darq keys in redis? For example if there any keys arq:in-progress:*
?
Also try to set queue_read_limit
by formula: "number of workers" * "max_jobs" (at least)
. In your case, I think, you can try queue_read_limit=300
. This can increase performance in overall.
Glad to hear that. It's great to have such project. In about a year of using darq I encountered this situation twice. Thanks for advice I'll try to check it next time.
@seedofjoy I've found the problem
When there are a lot of tasks in arq:in-progress:
new workers cannot start
I have a problem from time to time when queue is stuck on some message without any progress until most of the messages are invalidated by time. Restart or adding/removing clients manually don't help. Strange thing is CPU/RAM usage stays the same. Maybe there is some kind of limitation when too many records in queue? If anyone could help me at least how to debug a problem, because messages are not processing and logger has no new records either. Only such records from time to time
P.S. thanks for a great project. Have not seen anything better for asyncio