rivermont / spidy

The simple, easy to use command line web crawler.
GNU General Public License v3.0
334 stars 69 forks source link

Updated robots.txt URL formatter #71

Closed dstjacques closed 3 years ago

dstjacques commented 5 years ago

A URL with trailing slash could cause it to fail e.g. https://en.wikipedia.org/ URLs having the path as a substring of the domain could also fail because of the string replace occurring too many times e.g. http://example.com/example

Checklist

dstjacques commented 5 years ago

Fixes #70

rivermont commented 3 years ago

Thanks @dstjacques for the help but this was also solved by #77 and I just didn't get around to this until now. Cheers!