hakluke / hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
https://hakluke.com
GNU General Public License v3.0
4.49k stars 496 forks source link

Bugfix: Support Port Specification By Updating colly package to v2.1.0 #117

Closed ErikOwen closed 2 years ago

ErikOwen commented 2 years ago

Fixes #113.

Hakrawler fails to find paths if a port is specified. For an example, this command fails at identifying paths:

echo "https://example.com:443" | hakrawler

But this command succeeds:

echo "https://example.com" | hakrawler

This bug is fixed by updating to colly version 2.1.0 because a bugfix was submitted to the colly project to check allowed and disallowed domains by just the host, and not the host and the port. This PR ensures that both hakrawler and colly are using the same type of values for the "AllowedDomains" colly configuration (the host, and not the host and the port). As the codebase currently stands hakrawler is passing in just the host to the "AllowedDomains" but the old colly version that hakrawler is using is checking host and port.

ErikOwen commented 2 years ago

Converted this to a draft. I am investigating updating the Colly package to v2.1.0, which will resolve this issue in a different manner.

hakluke commented 2 years ago

Great work - thanks!

0xcrypto commented 2 years ago

Awesome!