smicallef / spiderfoot

SpiderFoot automates OSINT for threat intelligence and mapping your attack surface.
http://www.spiderfoot.net
MIT License
12.78k stars 2.23k forks source link

"By required data" options not honored #302

Closed LoZio closed 5 years ago

LoZio commented 5 years ago

I'm trying to start a scan without darknet information. I unselected this: immagine But in the log I see this: immagine And a lot of results here: immagine

bcoles commented 5 years ago

Presumably, scanning By Required Data chooses which modules to load based on the selected data types, rather than filtering results. Any results returned from a scan, which don't fit the selected criteria, are still reported.

In this instance, the sfp_darksearch module also produces SEARCH_ENGINE_WEB_CONTENT, so will also be loaded. Most of the darknet related modules produce the same events.

    def producedEvents(self):
        return ['DARKNET_MENTION_URL', 'DARKNET_MENTION_CONTENT', 'SEARCH_ENGINE_WEB_CONTENT']

If you're not interested in Darknet data, you may wish to disable modules which return darknet information.

$ grep -rn producedEvent modules/ -A 3 | grep -E 'DARKNET_MENTION_URL|DARKNET_MENTION_WEB_CONTENT'
modules/sfp_darksearch.py-50-        return ['DARKNET_MENTION_URL', 'DARKNET_MENTION_CONTENT', 'SEARCH_ENGINE_WEB_CONTENT']
modules/sfp_onioncity.py-51-        return ["DARKNET_MENTION_URL", "DARKNET_MENTION_CONTENT", 
modules/sfp_onionsearchengine.py-52-        return ["DARKNET_MENTION_URL", "DARKNET_MENTION_CONTENT", "SEARCH_ENGINE_WEB_CONTENT"]
modules/sfp_ahmia.py-52-        return ["DARKNET_MENTION_URL", "DARKNET_MENTION_CONTENT", "SEARCH_ENGINE_WEB_CONTENT"]
modules/sfp_intelx.py-74-        return ["LEAKSITE_URL", "DARKNET_MENTION_URL"]
modules/sfp_torch.py-51-        return ["DARKNET_MENTION_URL", "DARKNET_MENTION_CONTENT", "SEARCH_ENGINE_WEB_CONTENT"]

Spiderfoot makes use of a modular architecture, which allows granular configuration of which modules to load, and the associated module settings. You can learn more about the module architecture here:

If you're concerned about the retrieval of darknet content over Tor, and want to disable retrieval, each of the darknet related modules expose a fetchlinks Boolean option which can be disabled.

    # Option descriptions
    optdescs = {
        'fetchlinks': "Fetch the darknet pages (via TOR, if enabled) to verify they mention your target.",
        'max_pages': "Maximum number of pages of results to fetch."
    }
LoZio commented 5 years ago

Thank you for the explanation, it is clear now. This makes me vote for #281 since I resolved to use the "required data" method not to configure each time the set of modules I need.