Closed eidolonpg closed 7 years ago
I currently have a comment in the Battalion script noting that this scan is slow and in the README, but we should probably address this sooner than later. With the number of possible emails we generate we're looking at something like 23 minutes minimum.
Free API, should be sufficient to do target site lookups :) We could also incorporate the email scraping but at a later time - we could take names which match our linkedin scrape which also were detected to up the 'relevancy' of the email address, we could also gather the obvious ones like 'support, info, orders', etc. which won't be grabbed by the dork
Hunter.io integration should look like this.
curl -> https://api.hunter.io/v2/domain-search?domain=${DOMAINNAME}&api_key=${APIKEY}
Output is JSON and starts:
{
"data" : {
"domain" : "DOMAIN.COM",
"webmail" : true/false,
"pattern" : "{first}.{last}",
"emails" : [
{
"value" : "FIRST.NAME@DOMAIN.COM",
"type" : "personal",
"confidence" : 92,
"sources" : [
Emails then repeats for each found email.
To start with, we want to just extract the [data][pattern] item and use that to change our email list. Once that integration is complete we can create a new email scavenging option to find other emails and incorporate them in to our scan.
Files to edit:
battalion/battalion.sh
echo " --hunterio <api key> Sets a Hunter.io API Key and adds Hunter.io to the domain scan."
--hunterio)
HUNTERIO_ENABLED=true
HUNTERIO_API_KEY="$2"
shift
;;
export HUNTERIO_ENABLED=${HUNTERIO_ENABLED:-false}
export HUNTERIO_API_KEY
export HUNTERIO_DIRECTORY=$SCAN_DIRECTORY/hunterio
build_dir "${HUNTERIO_DIRECTORY}"
battalion/user-scan/user-scan.sh
touch $COMPROMISED_STYLE
$SCRIPT_DIRECTORY/user-scan/scripts/hunterio.sh \
< $POSSIBLE_EMAILS \
> $COMPROMISED_STYLE
battalion/user-scan/scripts/hunterio.sh
SCAN_DIRECTORY="${1}"
TARGETCOMPANY="$?"
HUNTERIO_API_KEY="${2}"
HUNTERIO_REQUEST="https://api.hunter.io/v2/domain-search?"
HUNTERIO_PARAMS="domain=${TARGETCOMPANY}&api_key=${HUNTERIO_API_KEY}"
RESULT=`curl -s -w "\n\n<%{http_code}>" ${HUNTERIO_REQUEST}${HUNTERIO_DOMAIN}${HUNTERIO_PARAMS}
STATUS_CODE=$(echo $RESULT | tail -n 1)
BODY=$(echo $RESULT | head -n -2)
[HANDLE REST OF OUTPUT]
We'll need to make sure the emails get regenerated after this and then properly fed in to hibp scan
To complete this change we'll probably want to update our pattern syntax to match hunter. This will make matching easier, and I like theirs better.
For reference:
{first}
= first name{last}
= last name{f}
= first name first letter{l}
= last name first letterWe'll see if that suffices for now.
See #29
The current user scan implementation is very slow because it uses the HaveIBeenPwned API for things that it probably shouldn't -- the API is great, but we're using it to poll emails that might not even exist and use it as a gateway to identify other emails.
Are there any other approaches for testing emails to find valid ones? If we can identify a valid pattern without using HIBP, we can use it on higher probability emails where it has drastically more value. Pretty much we should narrow its use as closely as possible to emails we know exist and might have been compromised.