eidolonpg commented 8 years ago

The current user scan implementation is very slow because it uses the HaveIBeenPwned API for things that it probably shouldn't -- the API is great, but we're using it to poll emails that might not even exist and use it as a gateway to identify other emails.

Are there any other approaches for testing emails to find valid ones? If we can identify a valid pattern without using HIBP, we can use it on higher probability emails where it has drastically more value. Pretty much we should narrow its use as closely as possible to emails we know exist and might have been compromised.

eidolonpg commented 8 years ago

I currently have a comment in the Battalion script noting that this scan is slow and in the README, but we should probably address this sooner than later. With the number of possible emails we generate we're looking at something like 23 minutes minimum.

theabraxas commented 8 years ago

https://hunter.io/

Free API, should be sufficient to do target site lookups :) We could also incorporate the email scraping but at a later time - we could take names which match our linkedin scrape which also were detected to up the 'relevancy' of the email address, we could also gather the obvious ones like 'support, info, orders', etc. which won't be grabbed by the dork

theabraxas commented 8 years ago

Hunter.io integration should look like this.

curl -> https://api.hunter.io/v2/domain-search?domain=${DOMAINNAME}&api_key=${APIKEY}

Output is JSON and starts:

{
  "data" : {
    "domain" : "DOMAIN.COM",
    "webmail" : true/false,
    "pattern" : "{first}.{last}",
    "emails" : [
      {
        "value" : "FIRST.NAME@DOMAIN.COM",
        "type" : "personal",
        "confidence" : 92,
        "sources" : [

Emails then repeats for each found email.

To start with, we want to just extract the [data][pattern] item and use that to change our email list. Once that integration is complete we can create a new email scavenging option to find other emails and incorporate them in to our scan.

Files to edit:

battalion/battalion.sh

Add in 'Optional Parameters' of usage (~line 23) add:

echo " --hunterio <api key> Sets a Hunter.io API Key and adds Hunter.io to the domain scan."

Add to while loop ~ line 102

        --hunterio)
            HUNTERIO_ENABLED=true
            HUNTERIO_API_KEY="$2"
            shift
            ;;

Add to 'exports' ~line 133

export HUNTERIO_ENABLED=${HUNTERIO_ENABLED:-false}
export HUNTERIO_API_KEY

Add to subdirectory structure ~line 181

export HUNTERIO_DIRECTORY=$SCAN_DIRECTORY/hunterio

Add to subdirectory structure ~line 196

build_dir "${HUNTERIO_DIRECTORY}"

battalion/user-scan/user-scan.sh

Replace lines 17-20

touch $COMPROMISED_STYLE
$SCRIPT_DIRECTORY/user-scan/scripts/hunterio.sh \ 
    < $POSSIBLE_EMAILS \
    > $COMPROMISED_STYLE

battalion/user-scan/scripts/hunterio.sh

New File

SCAN_DIRECTORY="${1}"
TARGETCOMPANY="$?"
HUNTERIO_API_KEY="${2}"
HUNTERIO_REQUEST="https://api.hunter.io/v2/domain-search?"
HUNTERIO_PARAMS="domain=${TARGETCOMPANY}&api_key=${HUNTERIO_API_KEY}"

RESULT=`curl -s -w "\n\n<%{http_code}>" ${HUNTERIO_REQUEST}${HUNTERIO_DOMAIN}${HUNTERIO_PARAMS}
STATUS_CODE=$(echo $RESULT | tail -n 1)
BODY=$(echo $RESULT | head -n -2)

[HANDLE REST OF OUTPUT]

We'll need to make sure the emails get regenerated after this and then properly fed in to hibp scan

eidolonpg commented 8 years ago

To complete this change we'll probably want to update our pattern syntax to match hunter. This will make matching easier, and I like theirs better.

For reference:

{first} = first name
{last} = last name
{f} = first name first letter
{l} = last name first letter

We'll see if that suffices for now.

eidolonpg commented 7 years ago

See #29

theabraxas / Battalion

User scan is very slow - possible workarounds or alternatives? #8

Add in 'Optional Parameters' of usage (~line 23) add:

Add to while loop ~ line 102

Add to 'exports' ~line 133

Add to subdirectory structure ~line 181

Add to subdirectory structure ~line 196

Replace lines 17-20

New File