zerohour-phishing-detection / zpd-server

Code and test data for anti-phishing tool: A decision-support tool for experimentation on zero-hour phishing detection
Creative Commons Attribution 4.0 International
2 stars 0 forks source link

Google reverse image search blocked by consent request #7

Closed TPGamesNL closed 3 months ago

TPGamesNL commented 4 months ago
024-02-06 22:48:05,730  [/home/teun/phishing/zdp-server/engines/google.py:259]  INFO: Starting browser session
2024-02-06 22:48:05,730  [/home/teun/phishing/zdp-server/engines/google.py:77]  INFO: Sending request to: http://www.google.com/search?q=Log+in+op+je+PayPal-rekening
2024-02-06 22:48:05,730  [/home/teun/phishing/zdp-server/engines/google.py:84]  INFO: Sending get request, attempt: 0
2024-02-06 22:48:06,996  [/home/teun/phishing/zdp-server/engines/google.py:92]  INFO: Status code: 200
2024-02-06 22:48:06,997  [/home/teun/phishing/zdp-server/engines/google.py:93]  INFO: Res body: <HTML url='https://consent.google.com/ml?continue=https://www.google.com/search%3Fq%3DLog%2Bin%2Bop%2Bje%2BPayPal-rekening%26gws_rd%3Dssl&gl=NL&m=0&pc=srp&uxe=none&cm=2&hl=nl&src=1'>

by code self.main_logger.info(f"Res body: {r.html}")

TPGamesNL commented 4 months ago

two main ways:

this, of course, isn't great, and not very stable. we should highly consider alternatives for these searches, such as a proper API

TPGamesNL commented 4 months ago

For normal (text) search, the issue was simply that Google renewed their interface, and thus changed names of CSS classes (which are obfuscated, so anytime they update it probably changes). These class names are used to identify the relevant links. While there is a cookie popup on the page as well, it is overlaid on top of the regular search page, so the HTML from the search results is still visible. This can easily be fixed by updating the CSS classes used in the code. Of course this is a short term solution, as the interface can change at any time and break this software.

For reverse image search, the issue is two-fold.

  1. The cookie popup: here, it is actually a redirect to a full-page cookie consent request. Not just an overlay, so the reverse img search results cannot be gotten without accepting cookies in some way (see comment above).
  2. Updated UI. Similar to that of the text search, they've updated the interface, and thus the CSS classes changed too. Even further, the URL has been updated (now it's all via google lens instead of via google images). This isn't too hard to deal with, for the short term.
TPGamesNL commented 4 months ago

Progress tracked in branch fix-google-search, and in PR #9

TPGamesNL commented 4 months ago

To get Google Lens to give good results again, try scaling the image and check performance per image size

TPGamesNL commented 4 months ago

finally fixed this. now sends a cookie requests via requests, then uses those cookies for the reverse image search. however, can't call html render because for some reason that still opens the cookie popup