Closed mamantoha closed 9 months ago
@mamantoha Thanks for trying out the new concurrent executor feature!
I'd be happy to help you sort this out. I've run shards.info locally with that commit and had no issue, but it doesn't mean something isn't broken. I think the log configuration for shards.info puts mosquito logs in a separate file -- can you share the logs from the worker when it's pegging the CPU to 100%?
It’s possible that 100% CPU usage is actually Mosquito doing what it was designed to do. How long did the CPU burst last? I believe I’m able to do a full update scrape in just a few minutes on my laptop.
Shards.info is a network bound workflow so running multiple threads will allow the scrape to be parallelized and scrape much faster than single threaded.
I haven’t set a configuration variable for it yet but you can monkey patch to change the number of executors which get spawned.
The CPU usage has consistently remained at 100% (Jan 3 - Jan 11), and it returns to normal after reverting to the previous mosquito commit.
I will try to reduce the issue on my Linux laptop over the weekend.
Ah yeah, I didn't think to look at the graph. If you can send me a database dump I'm happy to run it locally and see where the problem is too.
@robacarp I successfully replicated the issue on my Linux laptop. Where can I send you a database dump?
Setting getter executor_count = 1
didn't help.
It appears that the process consumes 100% CPU even when no jobs are being executed.
On macOS, the CPU usage is approximately 10%.
On e6b4b0a83e16b626934c10e9f64130fd4597d580 around 0%.
Oh interesting, thank you for the context. I'll shell into a linux VM somewhere and see if I can replicate.
I doubt the database is important.
@mamantoha I've sorted it out. I spent way too much time assuming the problem was in the new executor architecture but that is completely unrelated. The issue is here, this line:
def self.start(spin = true)
Log.notice { "Mosquito is buzzing..." }
instance.run
while spin && keep_running
+ Fiber.yield
end
end
I made the mistake of assuming that Fiber.yield was actually communicating to the scheduler that it should transfer control to another fiber, but that is apparently not the case. I'll make a sleep 2
patch shortly, and put in an issue for a smarter Runner one-shot class that doesn't just sleep.
Sorry for the delay in sorting this out!
Thanks @robacarp
Hi @robacarp . There is an issue that arose after commit 51904a05674757410268d183dd78fb2259ddbad7 on the shards.info production. The CPU usage remains consistently at 100%. Here's a screenshot from DigitalOcean for reference:
Upon reverting this commit, everything returns to normal:
I apologize for not being able to provide more details at this moment.