hakluke / hakrawler

Simple, fast web crawler designed for easy, quick discovery of endpoints and assets within a web application
https://hakluke.com
GNU General Public License v3.0
4.42k stars 483 forks source link

Add www option for -scope, add tests and CI #71

Closed epicfaace closed 4 years ago

epicfaace commented 4 years ago

Added the "www" option for -scope, which means that it will return all pages from the specified domain and "www" subdomain.

Also added tests to test the -scope change, generally refactored some I/O to allow for adding additional tests in the future, and added a GitHub Actions workflow to automatically run tests on pushes / PRs.

I also checked in go.sum, as it's supposed to be checked in to version control.

Sample successful workflow on this branch: https://github.com/epicfaace/hakrawler/runs/938451328?check_suite_focus=true

hakluke commented 4 years ago

This is awesome work! thank you!

hakluke commented 4 years ago

Hey @epicfaace!

Do you have any idea why when I do echo intigriti.com | hakrawler -scope www it doesn't return anything, but when I do echo intigriti.com | hakrawler -scope subs it does return some results that should have been in the -scope www output?

epicfaace commented 4 years ago

Hmm, when I run the subs command you sent, note that we have no results from https://intigriti.com or https://www.intigriti.com:

~ % echo intigriti.com | hakrawler -scope subs

██╗  ██╗ █████╗ ██╗  ██╗██████╗  █████╗ ██╗    ██╗██╗     ███████╗██████╗
██║  ██║██╔══██╗██║ ██╔╝██╔══██╗██╔══██╗██║    ██║██║     ██╔════╝██╔══██╗
███████║███████║█████╔╝ ██████╔╝███████║██║ █╗ ██║██║     █████╗  ██████╔╝
██╔══██║██╔══██║██╔═██╗ ██╔══██╗██╔══██║██║███╗██║██║     ██╔══╝  ██╔══██╗
██║  ██║██║  ██║██║  ██╗██║  ██║██║  ██║╚███╔███╔╝███████╗███████╗██║  ██║
╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝╚═╝  ╚═╝ ╚══╝╚══╝ ╚══════╝╚══════╝╚═╝  ╚═╝
                        Crafted with <3 by hakluke                        
[url] https://login.intigriti.com/account/register
[subdomain] login.intigriti.com
[url] https://go.intigriti.com/careers
[subdomain] go.intigriti.com
[url] https://blog.intigriti.com/
[subdomain] blog.intigriti.com
[url] https://go.intigriti.com/legalinformation
[url] https://go.intigriti.com/hackademy
[url] https://go.intigriti.com/faq
[url] https://go.intigriti.com/cookies
[url] https://go.intigriti.com/privacy
[url] https://go.intigriti.com/tac

So the problem seems to be the fact that hakrawler is not crawling sites on https://intigriti.com or https://www.intigriti.com in the -scope subs. If that issue is fixed, I'd think those results would show up in the -scope www command.