lindsey98 / PhishIntention

PhishIntention: Phishing detection through webpage intention
MIT License
45 stars 12 forks source link

How to add new target into target list #10

Open imethanlee opened 2 years ago

imethanlee commented 2 years ago

Hi,

I have two questions related to the target list of logos/brands.

  1. How to obtain these .png files of the logos of these brands from scratch?
  2. If I introduce new brands/logos into PhishIntention, do I just need to copy the new .png files to 'expand_targetlist'? (E.g., I have 3 new .png files of the logos of DBS, moving these 3 files to '.../expand_targetlist/dbs/' would be sufficient?)

Tks.

[UPDATE]: If we do introduce new brands/logos, do we need to mofidy the protected domains in some files (if these files exist)?

lindsey98 commented 2 years ago
  1. We manually add and clean the image search results from Google images.
    • Step 1: create a new folder called expand_targetlist/dbs/ and move the .png files inside as what you have done
    • Step 2: modify this file domain_map.pkl (should be under the same parent directory of expand_targetlist/) , it is a dictionary that maps brand to domain(s).
      import pickle

      with open('.../src/phishpedia_siamese/domain_map.pkl', 'rb') as handle:
    
      domain_map = pickle.load(handle)
      

if 'dbs' not in domain_map.keys():
    
      

    domain_map['dbs'] = ['dbs']


      with open('.../src/phishpedia_siamese/domain_map.pkl', 'wb') as handle:
    
      pickle.dump(domain_map, handle)
    • Step 3: Set reload_targetlist = True the FIRST time when you call load_config() function. You can set it to be False thereafter.
      AWL_MODEL, CRP_CLASSIFIER, CRP_LOCATOR_MODEL, SIAMESE_MODEL, OCR_MODEL, SIAMESE_THRE, LOGO_FEATS, LOGO_FILES, DOMAIN_MAP_PATH = load_config(cfg_path, reload_targetlist=True)
imethanlee commented 2 years ago

Thanks for the reply!

imethanlee commented 2 years ago

https://github.com/lindsey98/PhishIntention/issues/10#issuecomment-1140932139. Say we create a new folder called '.../expand_targetlist/Overseas Chinese Banking Corporation' for the logo images, do we need to put exactly the same name 'Overseas Chinese Banking Corporation' to the key of 'domain_map.pkl'?

domain_map['Overseas Chinese Banking Corporation'] = ['ocbc.com']



Can 'ocbc' work if my logo folder name is '.../expand_targetlist/Overseas Chinese Banking Corporation' ?

domain_map['ocbc'] = ['ocbc.com']



lindsey98 commented 2 years ago

Hi, they need to be the same. And it is domain_map['Overseas Chinese Banking Corporation'] = ['ocbc'], only need to put the domain not domain.tld

imethanlee commented 2 years ago

I see. Tks!