zerohour-phishing-detection / zpd-server

Code and test data for anti-phishing tool: A decision-support tool for experimentation on zero-hour phishing detection
Creative Commons Attribution 4.0 International
2 stars 0 forks source link

re-add clearbit #19

Open TPGamesNL opened 6 months ago

TPGamesNL commented 6 months ago

removed in https://github.com/zerohour-phishing-detection/zpd-server/pull/9, should look into why it was used, and add it back

TPGamesNL commented 6 months ago

Will delay this change. progress below.

reason for the delay is that having this check is extremely easy to subvert: clearbit looks for <meta property="og:logo" content="your_logo.png">, which is a logo not visible to the user. an attacker can thus simply put this meta tag in their html, with a logo that reverse searches to their actual website, which the server will then detect and mark as legitimate

```diff diff --git a/methods/dst.py b/methods/dst.py index a865b21..300d062 100644 --- a/methods/dst.py +++ b/methods/dst.py @@ -16,6 +16,7 @@ from parsing import Parsing from search_engines.image.google import GoogleReverseImageSearchEngine from search_engines.text.google import GoogleTextSearchEngine from utils import domains +from utils.clearbit import get_logo from utils.logging import main_logger from utils.logo_finder import LogoFinder from utils.result import ResultType @@ -87,7 +88,7 @@ class DST(DetectionMethod): f"[RESULT] Not phishing, for url {url}, due to registered domain validation" ) - return ResultType.LEGITIMATE + # return ResultType.LEGITIMATE with TimeIt("image-only reverse page search"): logo_finder = LogoFinder( @@ -110,7 +111,24 @@ class DST(DetectionMethod): f"[RESULT] Not phishing, for url {url}, due to registered domain validation" ) - return ResultType.LEGITIMATE + # return ResultType.LEGITIMATE + + with TimeIt("clearbit reverse logo search"): + logo = get_logo(url_registered_domain) + if logo is not None: + revimg = GoogleReverseImageSearchEngine() + + url_list_clearbit = list(itertools.islice(revimg.query(logo), 7)) + print(url_list_clearbit) + + # Handle results + if asyncio.run(check_search_results(url_registered_domain, url_list_clearbit)): + logger.info( + f"[RESULT] Not phishing, for url {url}, due to registered domain validation" + ) + + return ResultType.LEGITIMATE + # No match through images, go on to image comparison per URL with TimeIt("image comparisons"): diff --git a/utils/clearbit.py b/utils/clearbit.py new file mode 100644 index 0000000..b064748 --- /dev/null +++ b/utils/clearbit.py @@ -0,0 +1,12 @@ +from urllib.error import HTTPError + +import numpy as np +from skimage.io import imread + + +def get_logo(domain: str) -> np.ndarray | None: + url = f'https://logo.clearbit.com/{domain}' + try: + return imread(url) + except HTTPError: + return None ```
TPGamesNL commented 6 months ago

supervisor result: yes, but as toggleable option