Open imethanlee opened 2 years ago
Hi, I think it is because some of the benign websites do not contain info.txt file, in that case, kindly replace https://github.com/lindsey98/PhishIntention/blob/main/phishintention/phishintention_main.py#L169-L172 with the following code:
info_path = os.path.join(full_path, 'info.txt')
if not os.path.exists(screenshot_path): # screenshot not exist
continue
try:
url = open(info_path, encoding='ISO-8859-1').read()
except:
url = 'https://www' + item
By the way, for the ROC curve, we didn't run the Step 4: Dynamic analysis part, here is the code we use: https://github.com/lindsey98/PhishIntention/blob/main/phishintention/src/pipeline_eval.py#L20
Hi, I run 'run.py' based on the newest version of code on benign_25k dataset. This time it generates 25184 results, which is a bit less than 25400. Is it an expected outcome?
Problem solved.
Hi,
I ran the PhishIntention on 25K benign webpage dataset, which contains 25400 benign webpages. However, the output test results file only contains the results of 21813 webpages. I ran it several times but the output results number remained the same. Is it an expected outcome or something might go wrong?
P.S. The number matches when I test the algorithm on 25K CRP phishing webpage dataset. (25403 input webpages, 25403 output results)
Thanks in advance.