lindsey98 / PhishIntention

PhishIntention: Phishing detection through webpage intention
MIT License
45 stars 12 forks source link

Since the dynamic analysis may involve clicking a link and entering a new webpage, I was wondering whether we also need to crawl the new wabpages before running the test code? #6

Open imethanlee opened 2 years ago

imethanlee commented 2 years ago

Following this https://github.com/lindsey98/PhishIntention/issues/2#issuecomment-1124542367. It seems a bit hard to crawl the webpages that the landing webpages may link to (since we do not know where to click and what are the new urls), if we need to crawl these potential webpages first.

lindsey98 commented 2 years ago

No, we don't need to crawl the linked pages in advance. The dynamic analysis will decide where to click and interact with the webpage on-the-fly, we just need to make sure that the page is still alive.

imethanlee commented 2 years ago

Oh, perhaps I can make it clearer. For the 'Algorithm 1' in the paper, the dynamic analysis (line 15) is a recursive operation. After entering the linked page, it will go through the whole 'Algorithm 1' again. In this case, I think it also requires the webpage screenshot and the HTML code as the landing webpage does?

I have two different understanding here:

  1. Only the landing webpage requires crawling the screenshot and HTML code in advance. The dynamic analysis will automatically crawl the screenshots and HTML codes for the linked pages.
  2. The algorithm do not require screenshot and HTML code for the linked pages.

Does any one of them happen to be correct?

lindsey98 commented 2 years ago

1st is correct

imethanlee commented 2 years ago

Thanks for the answer.