Run mosquito using multiple processes

wout commented 2 years ago

I've got 5 jobs in my app. One of them takes about 45 seconds to complete and it's holding up all the other ones. Is there a way to run mosquito with multiple processes?

The alternative I have now is to create a small separate app and run it from there. And then scale by creating multiple instances of that app. But if there's another way, that would be better of course. Something like https://github.com/jwoertink/lucky-cluster.

robacarp commented 2 years ago

@wout 45 seconds is a long time! What's it waiting on? Network?

I've been toying around with adding a spawn in a few different places. Ultimately I think mosquito would benefit from a proper multi-fiber architecture rather than simply wrapping each job-run in a spawn. I've been waiting to implement it for real until getting anyone to mention it -- believe it or not in over 4 years of working on mosquito nobody has ever seriously asked about parallelism.

Since the mosquito runner sleeps regularly, and especially if your task isn't CPU bound, you can try spawning multiple runners (see caveats below):

spawn { Mosquito::Runner.start }
Mosquito::Runner.start

However, if you're using any Periodic Jobs, this will get you into trouble because it'll run the periodic job scheduler twice as well.

If you take the path of running a completely secondary process (or container, etc) you can get around this by selectively toggling the cron scheduler. That code might look something like this, but the implementation is wildly dependent on your deployment topology:

Mosquito.configure do |settings|
  settings.redis_url = (ENV["REDIS_URL"]? || "redis://localhost:6379")
  settings.run_cron_scheduler = ENV["PRIMARY_WORKER"]? == "true"
end

Related: #18

wout commented 2 years ago

You're right, 45 seconds is long, but it's inevitable. It's a job rendering an image as an SVG file. Rendering the file itself takes 2-15 seconds, depending on the content. Then from that 40Mb SVG, three PNG files are generated at 4k, 2k and 1k and finally, everything is packaged into a ZIP.

It's all super CPU-intensive and I've managed to optimize a lot already. The original generator was written in JS and creating the SVG alone would take up to five minutes. I've ported it to Crystal and optimized the algorithm, bringing it down to 15 seconds in the worst-case scenario. At first, I used ImageMagick to go from SVG to PNG, but that took multiple minutes too. So I switched to Inkscape which is a lot faster. As it is right now, I don't see where I can shave off more.

All the other jobs are doing fine. The slowest ones finish in 3ms in development. So, this is a unique situation I think. Probably I'm better off moving this part to a separate app and running multiple instances of it to be able to keep up with demand.

robacarp commented 2 years ago

@wout sounds like a neat pipline... and a brutal CPU intensive situation. I'd love to browse the code if it's public somewhere.

I don't think any spawn would help, you need more CPUs running in parallel. If you boot a secondary app or container you can easily process only your image queue jobs with the run_from configuration parameter.

wout commented 2 years ago

@wout sounds like a neat pipline... and a brutal CPU intensive situation. I'd love to browse the code if it's public somewhere.

I'm going to open-source most of the once the project is finished. The generator will be stored on the blockchain, so that'll be available sooner. We're launching Wednesday: https://dendrorithms.com.

I don't think any spawn would help, you need more CPUs running in parallel. If you boot a secondary app or container you can easily process only your image queue jobs with the run_from configuration parameter.

Don't think so either. My idea now is to move this job to a separate Lucky app, a tiny one, which is connected to the same database. And have it run periodically to see if new files need to be generated. I already have a locking mechanism in place, so jobs wouldn't be started twice. And then I can run four or five instances of that app in parallel, one for every available core on the server.

Anyway, thanks for your input!

wout commented 2 years ago

@robacarp In the meantime we're up and running. I created different systemd services for different tasks, and it's working pretty great. At the moment I'm skipping jobs based on an environment variable, but I believe that's what the run_from option is supposed to do, isn't it?

For example, I've got four jobs running in one service under the bookkeeper role. Then I have one or multiple services for the generator, uploader and minter roles, which are all more resource-intensive. They all run independently and are invoked by changing the status of a record in the database.

I'm now trying to figure out how the run_from option works, but it's not completely clear to me. Can I define a queue for every periodic job? Can I do that by overwriting the job_type class method?

robacarp commented 2 years ago

Glad to hear you got it running. run_from is built for that purpose, yes. I think you have the usage pattern correct -- job_type is intended to be something like "emailer" "generator" "minter". That method name might need to be changed so that it's more obvious what's happening. The queue name is built off of the job type, and then the run_from configuration parameter instructs the runner which queues should be watched.

I think it should look something like this:

class GeneratorJob < QueuedJob
   def self.job_type
     "generator"
  end

  # ...etc
end

run_queues = ENV["run_queues"]? || "generator, uploader"

Mosquito.configure do |settings|
  settings.run_from = run_queues.split ", "
end

The documentation is lacking here both in source code and on the docs site. The method overloading interaction for the queue name might also benefit from a quick simple macro or just a simple name change as well. Hopefully this helps.

wout commented 2 years ago

Awesome! Thanks, that explains a lot. I expected macro there, like queue "generator" for example, that's why I was a bit confused. I'm going to fix this for the next release using the job_type method for now.

mosquito-cr / mosquito

Run mosquito using multiple processes #80