Closed JesusCarvalho closed 2 years ago
Here is how to implement a spider using this gem: https://github.com/socketry/benchmark-http/blob/master/lib/benchmark/http/spider.rb
Do not mix threads and async code, it will not work correctly unless you know exactly how things are working.
Use async-container
for parallelism (i.e. spin up several spiders and use an IO for coordination).
To issue multiple requests concurrently, you should use a barrier
A barrier allows you to coordinate when work is completed, that is all. If you issue multiple requests and want to wait until they are all finished, user a barrier.
I'm sure you will have more questions but hopefully this is a start.
Thanks for the quick turnaround Sam. You are right, I do have more questions, but I'll study up and come back after I'm versed in all the code you sent my way.
You might enjoy this: https://github.com/socketry/async-container/blob/master/examples/queue/server.rb
I'm going to close this issue. If you have further questions or feedback, please feel free to start a discussion: https://github.com/socketry/async-http/discussions
With Ruby 3.1 and Async 2.x, threads are now supported, as well as Thread::Queue
and so on. It might provide some advantage in your use case but still requires some level of care.
First off, thanks for all the fantastic work. Gems like this one embody the best of the Ruby community.
I have questions regarding how to successfully implement a Producer/Consumer pattern with this gem. My questions stem from lack of understanding about scope and boundaries of the concurrency mechanisms, essentially, how do I synthesize and compose Ruby core and std lib with this gem to acheive the following:
1] Thread-safe queue (holding URLs to hit) 2] Multiple consumers (ie: async-http workers) using the above data-structure 3] Thread-safe recording of results from response (in case redundant endpoints are in list) 4] A producer that refills the data structure from a text file 5] Coordination of the above in an idiomatic way (signaling between consumers and producer)
The program in question would fill a queue with URLs to hit which would then be consumed by a fixed number (thread pool?) of asynchronous worker tasks that would call the endpoint and record the response.
So far I've pieced together some candidates for acheiving each of the above
1] Thread-safe Queue 2] This gem (obviously) 3] YAMLStore 4] Producer MonitorMixin examples: 5] Coordination
Everytime I try to combine the above to achieve the stated goal, I make a mess of things. I'm not sure of the "separation of concerns" and boundaries for above pieces with respect to handling of concurrency. I'm also not sure if the above list of candidates is complete (ex: do I need a mutex?). Any thoughts?
Bonus Question: Under "Multiple Requests" your documentation states: To issue multiple requests concurrently, you should use a barrier Based on my understanding of barriers I wonder why asynchronous requests have to wait on each other at all?
Many thanks in advance, TJ