mosquito-cr / mosquito

A background task runner for crystal applications supporting periodic (CRON) and manually queued jobs
MIT License
227 stars 24 forks source link

High CPU usage #126

Closed mamantoha closed 9 months ago

mamantoha commented 10 months ago

Hi @robacarp . There is an issue that arose after commit 51904a05674757410268d183dd78fb2259ddbad7 on the shards.info production. The CPU usage remains consistently at 100%. Here's a screenshot from DigitalOcean for reference:

image

Upon reverting this commit, everything returns to normal:

image image

I apologize for not being able to provide more details at this moment.

robacarp commented 10 months ago

@mamantoha Thanks for trying out the new concurrent executor feature!

I'd be happy to help you sort this out. I've run shards.info locally with that commit and had no issue, but it doesn't mean something isn't broken. I think the log configuration for shards.info puts mosquito logs in a separate file -- can you share the logs from the worker when it's pegging the CPU to 100%?

robacarp commented 10 months ago

It’s possible that 100% CPU usage is actually Mosquito doing what it was designed to do. How long did the CPU burst last? I believe I’m able to do a full update scrape in just a few minutes on my laptop.

Shards.info is a network bound workflow so running multiple threads will allow the scrape to be parallelized and scrape much faster than single threaded.

I haven’t set a configuration variable for it yet but you can monkey patch to change the number of executors which get spawned.

mamantoha commented 10 months ago

The CPU usage has consistently remained at 100% (Jan 3 - Jan 11), and it returns to normal after reverting to the previous mosquito commit.

image

I will try to reduce the issue on my Linux laptop over the weekend.

robacarp commented 10 months ago

Ah yeah, I didn't think to look at the graph. If you can send me a database dump I'm happy to run it locally and see where the problem is too.

mamantoha commented 10 months ago

@robacarp I successfully replicated the issue on my Linux laptop. Where can I send you a database dump?

Setting getter executor_count = 1 didn't help.

image

It appears that the process consumes 100% CPU even when no jobs are being executed.

mamantoha commented 10 months ago

On macOS, the CPU usage is approximately 10%.

image

On e6b4b0a83e16b626934c10e9f64130fd4597d580 around 0%.

robacarp commented 10 months ago

Oh interesting, thank you for the context. I'll shell into a linux VM somewhere and see if I can replicate.

I doubt the database is important.

robacarp commented 9 months ago

@mamantoha I've sorted it out. I spent way too much time assuming the problem was in the new executor architecture but that is completely unrelated. The issue is here, this line:

    def self.start(spin = true)
      Log.notice { "Mosquito is buzzing..." }
      instance.run

      while spin && keep_running
+        Fiber.yield
      end
    end

I made the mistake of assuming that Fiber.yield was actually communicating to the scheduler that it should transfer control to another fiber, but that is apparently not the case. I'll make a sleep 2 patch shortly, and put in an issue for a smarter Runner one-shot class that doesn't just sleep.

Sorry for the delay in sorting this out!

mamantoha commented 9 months ago

Thanks @robacarp