lindsey98 / Phishpedia

Official Implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21
124 stars 39 forks source link

About the domain #16

Closed Fujiaoji closed 1 year ago

Fujiaoji commented 1 year ago

Hi,

Hope you doing well.

I have run your code of phishpedia, but now I find that you have updated the code.

When I check the "phishpedia_classifier_logo" function and run the code on some benign websites, I find that the "matched_domain" always not include ".com", ".cn", etc., while the "tldextract.extract(url).domain + '.' + tldextract.extract(url).suffix" include these information. So I am wondering if there is a bug or you also need to update the domain.pkl. I am not sure, just put forward my question.

Besides, I want to make sure what I am understanding is right. For the "phishpedia_classifier_logo" function, if the predict target brand is not None and the domain in the maintained domain list, then it should be benign and the output pred_target is None rather than the real brand?

Thanks

Best

Fujiao

lindsey98 commented 1 year ago

Hi fujiao,

Yes I tried to update the domain matching function to include top-level-domain as well. Can you try replacing the old domain.pkl with new one https://drive.google.com/file/d/1nTIC6311dvdY4cGsrI4c3WMndSauuHSm/view?usp=sharing. Thanks!

Best, Ruofan

Fujiao Ji @.***> 于2023年6月24日周六 04:31写道:

Hi,

Hope you doing well.

I have run your code of phishpedia, but now I find that you have updated the code.

When I check the "phishpedia_classifier_logo" function and run the code on some benign websites, I find that the "matched_domain" always not include ".com", ".cn", etc., while the "tldextract.extract(url).domain + '.' + tldextract.extract(url).suffix" not include these information. So I am wondering if there is a bug or you also need to update the domain.pkl. I am not sure, just put forward my question. Thanks

Best

Fujiao

— Reply to this email directly, view it on GitHub https://github.com/lindsey98/Phishpedia/issues/16, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMJCOK2WA3RATSKQDGPIKPDXMX4LPANCNFSM6AAAAAAZSAD62U . You are receiving this because you are subscribed to this thread.Message ID: @.***>

Fujiaoji commented 1 year ago

Oh, got it. Thanks

I saw these 2 domain.pkl, and found that the former one can produce this while the new one contains many bytes in the "cert_der", is there having a good way to open the new domain.pkl? And what is the meaning of "cert_der". I saw the same code in phishpedia of opening this file. So I am wondering if there is an updated code. Thanks

Screenshot 2023-07-08 at 2 52 52 PM Screenshot 2023-07-08 at 2 52 35 PM

lindsey98 commented 1 year ago

Hi Fujiao, sorry that I think the former one should be correct, I wrongly uploaded.

lindsey98 commented 1 year ago

I changed it by checking "matched_domain" with "tldextract.extract(url).domain" only, thanks

Fujiaoji commented 1 year ago

Okay, got it. I would like to double check, the phishintention and phishpedia use the same maintained domain.pkl, is this right? Thanks

lindsey98 commented 1 year ago

Hi Fujiao, Yes they are.

Fujiao Ji @.***> 于2023年10月29日周日 04:42写道:

Okay, got it. I would like to double check, the phishintention and phishpedia use the same maintained domain.pkl, is this right? Thanks

— Reply to this email directly, view it on GitHub https://github.com/lindsey98/Phishpedia/issues/16#issuecomment-1783917201, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMJCOK4JIPZAZ4SAHD23LZLYBVU4DAVCNFSM6AAAAAAZSAD62WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBTHEYTOMRQGE . You are receiving this because you commented.Message ID: @.***>

Fujiaoji commented 1 year ago

Okay. Helps me a lot. Thanks