projectdiscovery / katana

A next-generation crawling and spidering framework.
MIT License
10.87k stars 573 forks source link

Passive crawling from external sources support #139

Closed longnguyenhuynh closed 6 months ago

longnguyenhuynh commented 1 year ago

Please describe your feature request:

Add the ability to get passive URLs / endpoints from -

CLI Options -

   -ps, -passive                   enable passive sources to discover target endpoints
   -pss, -passive-source           passive source to use for url discovery (wayback,urlscan,commoncrawl,virustotal,alienvault)

JSON Output -

{
  "timestamp": "2022-11-05T22:33:27.745815+05:30",
  "endpoint": "https://mail.google.com/mail/u/0/?ik=a9f1fef565&view=pt&search=all&permthid=thread-f:1717382568649591026",
  "source": "https://otx.alienvault.com/api/v1/indicators/domain/google.com/url_list?limit=100&page=5",
  "mode": "passive"
}

Example run:

katana -u hackerone.com -passive -silent

https://hackerone.com/redirect?signature=147f892574d9bece120cd41a5c4539e3fa8e8066&url=https://vimeo.com/137725491
https://hackerone.com/teams/new
https://hackerone.com/sandbox
https://support.hackerone.com/hc/en-us/articles/211538803-Step-by-Step-How-to-write-a-good-vulnerability-report
https://www.hackerone.com/blog/H1-415-Recap-Oath-Pays-Over-400000-Hackers-One-Da%20y
https://hackerone.com/txt3rob
https://hackerone.com/redirect?signature=4d7211d04ad487ae4b5053792b10fe43badb57fe&url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3D5iRylyJTzWc
http://hackerone.com/googleplay
https://hackerone.com/reports/104543
https://hackerone.com/reports/83803
https://hackerone.com/reports/253558
https://hackerone.com/users/confirmation?confirmation_token=z1E1oUpkrMnuBQpMHDd
https://www.hackerone.com/product/challenge
https://www.hackerone.com/blog/Sikurs-COO-Hacker-Diversity-Essential-Securing-SIKURPhone
https://hackerone.com/reports/199438
https://hackerone.com/workday
http://api.hackerone.com:8880/
https://hackerone.com/redirect?signature=c8ae58718e901ab4b54c7bcab54b924cb07a386b&url=http://blog.innerht.ml/overflow-trilogy/
https://hackerone.com/reports/269831
http://www.hackerone.com:8880/
http://hackerone.com:8880/
https://hackerone.com/redirect?signature=0486c622361ef174ec407e000fd0f4e54bdaec4a&url=https://www.owasp.org/index.php?title=Broken_Authentication_and_Session_Management&setlang=en
https://hackerone.com/users/sign_up
https://hackerone.com/reports/223609
https://www.hackerone.com/sites/default/files/2018-07/The%20Hacker-Powered%20Security%20Report%202018.pdf
http://hackerone.com/w2w
http://api.hackerone.com:2083/
http://api.hackerone.com:443/
http://api.hackerone.com:8443/
https://hackerone.com/notifications
https://www.hackerone.com/blog/How-to-Hack-Get-Started-Hacking-Mobile
https://hackerone.com/spotify
https://hackerone.com/users/confirmation?confirmation_token=AxJjSSHbxLuxE2wsKrLz
https://www.hackerone.com/privacy
https://hackerone.com/bugs?subject=user&report_id=
https://hackerone.com/hacktivity
https://hackerone.com/reports/180074
https://hackerone.com/glasswire
https://www.hackerone.com/sites/default/files/2018-06/HackerOne-BlackHat-Vegas-Week-Activities-2018_0.pdf
https://hackerone.com/augurproject
https://hackerone.com/leaderboard/all-time
https://hackerone.com/uber
https://hackerone.com/reports/244504
https://hackerone.com/egyptghost1&d=DwMFaQ&c=7DfhQjPWzR3PmWBQVpi%kw&r=nZr0nOaewW9j3jAt8xfGtw&m=R1VVkSZXns7lMVewqXGum%CDerCjoWKII9VPm54%kyk&s=unQflkqs62j/8P6jUmj6hUs5SNbLS8F53i0sZm4DZwE&e=
https://hackerone.com/hacktivity?sort_type=popular&filter=type:all&page=1&range=forever
https://hackerone.com/gcheng
https://hackerone.com/jrjn
https://hackerone.com/spyboy
https://hackerone.com/dchan

Note:

  1. In passive mode, all the applicable options like scope/filters, etc will be supported, except active crawling.
  2. -passive-source option can specify single or multiple (comma-separated) sources.
  3. as default, all supported passive sources will be used in passive mode.
  4. passive crawling mode is optional, can be enabled with -passive flag.

Describe the use case of this feature:

Katana's missing some important URLs compares to other crawler tools

### Tasks
- [ ] https://github.com/projectdiscovery/katana/issues/782
- [ ] https://github.com/projectdiscovery/katana/issues/783
- [ ] https://github.com/projectdiscovery/katana/issues/784
ehsandeep commented 1 year ago

@longnguyenhuynh Katana is primarily an active web crawler; although, thanks to your feature suggestion, passive sources will be added soon.

Mzack9999 commented 1 year ago

@ehsandeep, could you detail how passive URLs should be handled? For example, are they only retrieved and listed?

fail-open commented 1 year ago

Itd be cool if you could use passive sources to seed the spider, so if there is a passive source of a page that currently isnt linked to so active spider wouldn't find it, the passive record would be spidered from and new pages could potentially be found.

ehsandeep commented 1 year ago

@Mzack9999 issue is now updated with details.

@fail-open good idea and something to consider/work on after passive support implementation.

brenocss commented 1 year ago

in naabu we use -passive and -verify, -verify would be great to add some match conditions such as 'status-code, regex, etc'