hakluke / hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
https://hakluke.com
GNU General Public License v3.0
4.49k stars 497 forks source link

Trim white space prefix in line of robots.txt #85

Closed gigarashi closed 3 years ago

gigarashi commented 3 years ago

Run this command, go run . -url google.com -robots -plain > results.txt, can not found /m/finance in results, but it's in https://www.google.com/robots.txt.

This is because some lines in robots.txt have more than one white space, function recordIfInScope parse the url and throw an error, so it' not in the result.

Trim it in function parseRobots will fix this bug.

hakluke commented 3 years ago

Great find - thank you!