lindsey98 / Phishpedia

Official Implementation of "Phishpedia: A Hybrid Deep Learning Based Approach to Visually Identify Phishing Webpages" USENIX'21
Creative Commons Zero v1.0 Universal
129 stars 45 forks source link

Model evaluation issues #19

Closed nTjing closed 10 months ago

nTjing commented 1 year ago

Hi Lin, I would like to evaluate the Phishpedia model on the dataset you provided. Hence I would like to ask whether pipeline_eval.py is used to evaluate the whole model (Phishpedia), including accuracy and recall.

lindsey98 commented 1 year ago

Hi, I find I did not include the evaluation py file. Basically, you can run phishpedia_main.py to log the result file in a txt file. Then you can look for two columns:

   - "phish" is the phishing prediction, 0 for benign, 1 for phishing.
   - "prediction"  gives the brand that the phishing webpage is

targeting, default is None.

If you want to compute classification metrics (such as precision, recall) then you can use the "phish" column. If you want to get the phishing identification accuracy = (number of phishing that correctly identify the target) / (number of phishing reported). You can use the "prediction" column. Thanks

nTjing @.***> 于2023年10月16日周一 16:24写道:

Hi Lin, I would like to evaluate the Phishpedia model on the dataset you provided. Hence I would like to ask which py file is used to evaluate the whole model (Phishpedia).

— Reply to this email directly, view it on GitHub https://github.com/lindsey98/Phishpedia/issues/19, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMJCOK45E6THVT7YV6IDJBLX7TVMTANCNFSM6AAAAAA6BZB3VM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

nTjing commented 1 year ago

Hi, I find I did not include the evaluation py file. Basically, you can run phishpedia_main.py to log the result file in a txt file. Then you can look for two columns: - "phish" is the phishing prediction, 0 for benign, 1 for phishing. - "prediction" gives the brand that the phishing webpage is targeting, default is None. If you want to compute classification metrics (such as precision, recall) then you can use the "phish" column. If you want to get the phishing identification accuracy = (number of phishing that correctly identify the target) / (number of phishing reported). You can use the "prediction" column. Thanks nTjing @.> 于2023年10月16日周一 16:24写道: Hi Lin, I would like to evaluate the Phishpedia model on the dataset you provided. Hence I would like to ask which py file is used to evaluate the whole model (Phishpedia). — Reply to this email directly, view it on GitHub <#19>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMJCOK45E6THVT7YV6IDJBLX7TVMTANCNFSM6AAAAAA6BZB3VM . You are receiving this because you are subscribed to this thread.Message ID: @.>

Thank you for your prompt response! Thanks!

nTjing commented 1 year ago

I would like to ask if I put the dataset I want to test under the path “phishpedia/datasets/test_sites”, and then I can run phishpedia_main.py to log the result file in a txt file?

lindsey98 commented 1 year ago

Hi, yes, it will be. Just run the following code, here run.py is calling the runit() function in phishpedia_main.py, and the folder flag specifies where the folder is:

python run.py --folder phishpedia/datasets/test_sites

nTjing @.***> 于2023年10月17日周二 09:00写道:

I would like to ask if I put the dataset I want to test under the path “phishpedia/datasets/test_sites”, and then I can run phishpedia_main.py to log the result file in a txt file?

— Reply to this email directly, view it on GitHub https://github.com/lindsey98/Phishpedia/issues/19#issuecomment-1765491305, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMJCOK5PDRPXQFWHSPLQ7XLX7XKDDAVCNFSM6AAAAAA6BZB3VOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRVGQ4TCMZQGU . You are receiving this because you commented.Message ID: @.***>

nTjing commented 1 year ago

Hi, yes, it will be. Just run the following code, here run.py is calling the runit() function in phishpedia_main.py, and the folder flag specifies where the folder is: python run.py --folder phishpedia/datasets/test_sites

Thanks!