It should be up to crawler developer whether he want's to parallelize the process, or not.
For example for some social networks it is not good to scrap with many parallel threads.
For we need to come up with proof-of-concept, that allows to parallelize certain pieces with separate sidekiq tasks.
For gallery crawler it makes sense to parallelize each separate page scrapping, to make it faster.
parallelize do |context|
# some action
end
Passing code to this block should spawn a separate job, that receives all necessary context, for example page url, maybe cookies and so on.
The question is, how are we going to collect all scrapped data?
We need to come up with some synchronization mechanism, or each worker should report its results separately.
It should be up to crawler developer whether he want's to parallelize the process, or not. For example for some social networks it is not good to scrap with many parallel threads.
For we need to come up with proof-of-concept, that allows to parallelize certain pieces with separate sidekiq tasks.
For gallery crawler it makes sense to parallelize each separate page scrapping, to make it faster.
Passing code to this block should spawn a separate job, that receives all necessary context, for example page url, maybe cookies and so on.