Closed samiabdelhamid closed 1 week ago
Hey @samiabdelhamid,
For Part 1: It seems you might be reaching the limit of threads available on your machine, which is why you can only support a maximum of 8 concurrent workers. Could you please share the configuration of the machine you're using?
For Part 2: When the workers pick up the job, is the data
being populated? Or is it just the current_step
variable that isn’t being filled?
@rafaelsideguide Thanks for your reply
Part1: using EC2 machine type "t3.large" The t3.large instance has 2 vCPUs, and each vCPU can handle 2 threads . Therefore, a t3.large instance can handle 4 threads in total.
Part 2: no data populated at all as well only status active with no change {"status":"active","data":null,"partial_data":[]}
Hey @samiabdelhamid,
Given your t3.large instance can handle 4 threads, I recommend setting your worker limit to 4. This should keep things running smoothly without overloading your system.
Also, since we use Bull for queue management in Firecrawl, you might find this Bull GitHub discussion helpful for understanding how concurrency works with our setup.
Hope this helps! Let me know if there's anything else you need.
@rafaelsideguide Thank you for your reply, the issue for Part 1 was the NUM_OF_QUEUES set to 8. I followed your suggestion and changed the worker's limit to 4 for things to run smoothly, but the same issue in part 2 remains. For Part 2, if a job gets a waiting status, when it goes active, there is no data or step returned
@samiabdelhamid, recently (in PR #459) we switched from Bull to BullMQ, which offers better management of job concurrency. Could you update your repo, install the new packages, and test if the data problem is still happening?
Closing this one for now, as we've made several improvements to the concurrency system. Feel free to reopen if you continue to experience any problems.
the issue is of 2 parts. part 1: case 1: setting the docker compose concurrency and workers both to 10 then send 10 concurrent requests to /crawl then send 10 concurrent respective requests to the respective jobIds result: 8 jobs active with current_step: "scraping" + 2 waiting jobs
case 2: setting the docker compose concurrency and workers both to 12 then send 10 concurrent requests to /crawl then send 10 concurrent respective requests to the respective jobIds result: 8 jobs active with current_step: "scraping" + 2 waiting jobs
part 2: The 2 waiting jobs turn to status 'active' without current_step indefinetly as a response for all countless resends of the request {"status":"active","data":null,"partial_data":[]}