zerohour-phishing-detection / zpd-server

Code and test data for anti-phishing tool: A decision-support tool for experimentation on zero-hour phishing detection
Creative Commons Attribution 4.0 International
2 stars 0 forks source link

Logo detection bad rates #8

Open TPGamesNL opened 7 months ago

TPGamesNL commented 7 months ago

Classifier for detecting whether a certain portion of a screenshot is a logo isn't that good.

Decision matrix: image (apologies for bad text coloring, vscode is nice like that sometimes)

text version: Confusion matrix of Decision Tree optimized for recall_score on the test data: pred_neg pred_pos neg 35977 43 pos 168 161

it's great at determining that something isn't a logo, but if you feed it a logo it's essentially a 50/50

TPGamesNL commented 7 months ago

Side note, but the way the classifier is used is interesting. The output for a certain portion is a probability ([0,1] I think).

They take all the images extracted from the screenshot, and calculate the logo probability. They then take the 3 with the highest probability and use them for reverse image search