yeti-platform / yeti

Your Everyday Threat Intelligence
https://yeti-platform.io/
Apache License 2.0
1.71k stars 286 forks source link

Multiple Dataflows don't update #287

Closed chenerlich closed 6 years ago

chenerlich commented 6 years ago

Description

I integrated the platform about a month ago, and now all the updated data shown in "Browse" is 99% phising. When i look into the dataflows table, i see that while there are feeds that get updated, there are also feeds that don't. I restarted the system\workers\schedulers a few times, and pulled and "syncdb" the repository to its latest state. No change regarding those dataflows. Attached a screenshot for few for few of them:

capture (Today is the 28/08/2018...)

tomchop commented 6 years ago

Yeah, this seems to be about Mongo & celery not getting along well and deadlocking at some point, especially in smaller servers. What kind of specs are you running Yeti on?

For the time being, you can "unlock" feeds using this mongo script:

use yeti;
db.schedule_entry.update({lock: true}, {$set: {lock: false, status: "Unlocked..."}}, {multi: true});
chenerlich commented 6 years ago

@tomchop 4CPU , 16gb mem

What should i upgrade?

tomchop commented 6 years ago

I guess more CPUs / cores couldn't hurt. Can you show me how the services are started? The default is to start 8 processes for feeds and 10 for analytics, so you have 18+ processes hitting mongodb simultaneously on a 4 CPU machine.

tomchop commented 6 years ago

You could also try decreasing the amount of workers (check the systemd scripts) in each service to something more adapted to your instance.

chenerlich commented 6 years ago

@tomchop Changed specs to 8 CPUs, 32 GB RAM. It looks like this: capture

capture2

Is the mongo script still relevant?

tomchop commented 6 years ago

TLDR: Yes, I would: a) stop all feeds, b) run script, c) relaunch feeds

The longer answer is that we store the state of the feeds in the db so that the same feed is not launched a second time while it's already running. If they deadlock, the feed is still marked as "running" in the database, but not actually running. Worst case, it keeps a whole worker busy, best case it will never run before it is unlocked. This is a bit hard to reproduce, but after the time we've spent on this we're pretty sure it comes from Celery and Mongo not working together nicely.

chenerlich commented 6 years ago

Worked great. Thanks!

sebdraven commented 6 years ago

Perfect ! I've closed this issue !

jalbrizio commented 5 years ago

thanks for this, I ran into the same issue