Closed epicfaace closed 4 years ago
This is awesome work! thank you!
Hey @epicfaace!
Do you have any idea why when I do echo intigriti.com | hakrawler -scope www
it doesn't return anything, but when I do echo intigriti.com | hakrawler -scope subs
it does return some results that should have been in the -scope www
output?
Hmm, when I run the subs command you sent, note that we have no results from https://intigriti.com or https://www.intigriti.com:
~ % echo intigriti.com | hakrawler -scope subs
██╗ ██╗ █████╗ ██╗ ██╗██████╗ █████╗ ██╗ ██╗██╗ ███████╗██████╗
██║ ██║██╔══██╗██║ ██╔╝██╔══██╗██╔══██╗██║ ██║██║ ██╔════╝██╔══██╗
███████║███████║█████╔╝ ██████╔╝███████║██║ █╗ ██║██║ █████╗ ██████╔╝
██╔══██║██╔══██║██╔═██╗ ██╔══██╗██╔══██║██║███╗██║██║ ██╔══╝ ██╔══██╗
██║ ██║██║ ██║██║ ██╗██║ ██║██║ ██║╚███╔███╔╝███████╗███████╗██║ ██║
╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝ ╚══╝╚══╝ ╚══════╝╚══════╝╚═╝ ╚═╝
Crafted with <3 by hakluke
[url] https://login.intigriti.com/account/register
[subdomain] login.intigriti.com
[url] https://go.intigriti.com/careers
[subdomain] go.intigriti.com
[url] https://blog.intigriti.com/
[subdomain] blog.intigriti.com
[url] https://go.intigriti.com/legalinformation
[url] https://go.intigriti.com/hackademy
[url] https://go.intigriti.com/faq
[url] https://go.intigriti.com/cookies
[url] https://go.intigriti.com/privacy
[url] https://go.intigriti.com/tac
So the problem seems to be the fact that hakrawler is not crawling sites on https://intigriti.com or https://www.intigriti.com in the -scope subs. If that issue is fixed, I'd think those results would show up in the -scope www command.
Added the "www" option for -scope, which means that it will return all pages from the specified domain and "www" subdomain.
Also added tests to test the -scope change, generally refactored some I/O to allow for adding additional tests in the future, and added a GitHub Actions workflow to automatically run tests on pushes / PRs.
I also checked in
go.sum
, as it's supposed to be checked in to version control.Sample successful workflow on this branch: https://github.com/epicfaace/hakrawler/runs/938451328?check_suite_focus=true