Optimizations: some general things, and implemented a hash set that prevents the same url from being visited multiple times, which frequently lead to infinite crawling.
Bug fixes: potential issues with how the domain restrictions were being handled.
Out of scope paths and domains: users can now enter domains and paths that are out of the scope of the scan (useful for pentests).
Headless browser support: the ability to use a headless browser rather than just requests.get() when making requests. This is more thorough, as dynamic content of the site is accessed due to the web page actually being rendered. This does lead to longer waits, but can be worth it depending on how the target site is put together. In the future, user-like interaction with the site can be implemented. This feature was implemented using selenium.
Checklist
[X] I wrote at least some documentation for this feature.
Checklist
[X] This Pull will not add the same thing as another currently-open request.
[X] Your Pull was made against the rivermont:dev branch and not rivermont:master.
[X] This Pull does not commit any keys, passwords, personal data, or other private information.
[X] I updated lines 20 and 21 in the README to reflect any changed I made.
Feature Description
requests.get()
when making requests. This is more thorough, as dynamic content of the site is accessed due to the web page actually being rendered. This does lead to longer waits, but can be worth it depending on how the target site is put together. In the future, user-like interaction with the site can be implemented. This feature was implemented usingselenium
.Checklist
Checklist
rivermont:dev
branch and notrivermont:master
.