workarea-commerce / workarea

Workarea is an enterprise-grade Ruby on Rails commerce platform
https://www.workarea.com
Other
327 stars 66 forks source link

Parallel Requests - rake workarea:cache:prime_images #188

Open GesJeremie opened 5 years ago

GesJeremie commented 5 years ago

Is your feature request related to a problem? Please describe. I'm currently running the rake task workarea:cache:prime_images on few thousands products and I'm frustrated of how slow it is.

Describe the solution you'd like The current written code (workarea-core-3.4.16) is the following one:

namespace :workarea do
  namespace :cache do
    desc 'Prime images cache'
    task prime_images: :environment do
      include Rails.application.routes.url_helpers
      include Workarea::Storefront::ProductsHelper
      include Workarea::Core::Engine.routes.url_helpers

      built_in_jobs = [:thumb, :gif, :jpg, :png, :strip, :convert, :optimized]

      jobs = Dragonfly.app(:workarea).processor_methods.reject do |job|
        built_in_jobs.include?(job)
      end

      Workarea::Catalog::Product.all.each_by(50) do |product|
        product.images.each do |image|
          jobs.each do |job|
            url = URI.join(
              "https://#{Workarea.config.host}",
              dynamic_product_image_url(
                image.product.slug,
                image.option,
                image.id,
                job,
                only_path: true
              )
            ).to_s

            begin
              `curl #{url}`
              puts "Downloaded image #{url}"
            rescue StandardError => e
              puts e.inspect
            end
          end
        end
      end
    end
  end
end

It's basically a loop requesting a url through curl, wait for the result and go to the next record. The obvious optimization would be to run the curl requests in parallel.

In my side projects I usually use https://github.com/typhoeus/typhoeus and his Hydra "engine" but I'm pretty sure we can come up with some bash magic and call it a day.

bencrouse commented 5 years ago

Thanks for the issue @GesJeremie.

I'm reticent to add another lib just for this one use case, but 2 options come to mind for parallelizing:

eric-pigeon commented 5 years ago

ConcurrentRuby is a dependency of ActiveSupprt. There's already a CachedThreadPool available as Concurrent.global_io_executor, although a FixedThreadPool might be a better fit to throttle the number of active requests.