projectdiscovery / katana

A next-generation crawling and spidering framework.
MIT License
10.81k stars 571 forks source link

Add support for not following HTTP redirects #561

Closed ErikOwen closed 12 months ago

ErikOwen commented 1 year ago

Please describe your feature request:

When crawling a target, do not follow HTTP redirects. Specifically, if a 301 or 302 redirect response code is returned from the server, do not submit an additional request to the value of the Location header.

Describe the use case of this feature:

I want to be able to use katana to crawl an HTTP target that redirects to HTTPS, and be able to see a 301 response and the corresponding headers in the JSON output. I cannot currently do that, because it appears as though katana automatically follows redirects. Here is an example:

Example of a target that has an HTTP -> HTTPS redirect

 > curl -I "http://projectdiscovery.io"
HTTP/1.1 301 Moved Permanently
...

Example of katana following the redirect:

> echo "http://projectdiscovery.io" | katana -silent -d 1 -jsonl -ob -or | jq
{
  "timestamp": "2023-08-16T11:03:53.534286-07:00",
  "request": {
    "method": "GET",
    "endpoint": "http://projectdiscovery.io"
  },
  "response": {
    "status_code": 200,
    "headers": {
      "nel": "{\"success_fraction\":0,\"report_to\":\"cf-nel\",\"max_age\":604800}",
      "x_region": "us-west-2",
      "server": "cloudflare",
      "cache_control": "public, max-age=0, must-revalidate",
      "last_modified": "Thu, 10 Aug 2023 19:30:18 GMT",
      "x_fallback": "No Fallback",
      "connection": "keep-alive",
      "date": "Wed, 16 Aug 2023 18:03:52 GMT",
      "strict_transport_security": "max-age=0; preload",
      "vary": "Accept-Encoding",
      "link": "<https://framerusercontent.com>; rel=\"preconnect\", <https://framerusercontent.com>; rel=\"preconnect\"; crossorigin=\"\"",
      "x_content_type_options": "nosniff",
      "cf_cache_status": "DYNAMIC",
      "report_to": "{\"endpoints\":[{\"url\":\"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=n2k9Dxr3i%2F4XJarE%2FZgCavnGvuIzI7rQ8wpGgKgXsTcwwIx6uCw0QuoFuSwVe%2BbOsYrGFvAMTCHSJifZZ%2B5RBU%2BDiOY6mPrW3tFyC3tWDsHnPmi2lnHVmg0bGPDIw69g49ALK1w%3D\"}],\"group\":\"cf-nel\",\"max_age\":604800}",
      "x_cache": "Cached",
      "cf_ray": "7f7b98b63ff17e70-LAX",
      "content_type": "text/html"
    },
    "technologies": [
      "HSTS",
      "Cloudflare"
    ]
  }
}
w1gs commented 12 months ago

Opened PR #588 for this. Hope this helps!

ErikOwen commented 11 months ago

Thank for looking into this issue and pushing up a solution, @WigzyDev! I very much appreciate you taking the time to look into this 🙏.

Using katana v1.0.4 I am seeing that the -dr flag does properly report the redirect (301 status code) 🎉. But if I set a larger max crawl depth (-d flag) katana still follows the redirects:

> echo "http://projectdiscovery.io" | katana -silent -d 3 -dr
http://projectdiscovery.io
https://projectdiscovery.io/
https://projectdiscovery.io/cdn-cgi/l/email-protection
https://projectdiscovery.io/cdn-cgi/scripts/5c5dd728/cloudflare-static/email-decode.min.js
https://chaos.projectdiscovery.io/
https://blog.projectdiscovery.io/announcing-nuclei-cloud/
https://blog.projectdiscovery.io/
https://blog.projectdiscovery.io/stop-pentesting-start-programming/
https://blog.projectdiscovery.io/the-best-defense-is-a-good-offensive-security-program/
https://blog.projectdiscovery.io/hunting-c2-servers/
https://projectdiscovery.io/requestdemo
https://projectdiscovery.io/nuclei
https://projectdiscovery.io/community
https://projectdiscovery.io/terms
https://projectdiscovery.io/privacy
https://projectdiscovery.io/cloudplatform
https://projectdiscovery.io/aboutus

I would expect katana to not crawl any HTTPS pages when the -dr flag is set, and the target is HTTP (http://projectdiscovery.io).

I'll open up another issue to track this bug.