projectdiscovery / katana

A next-generation crawling and spidering framework.
MIT License
10.25k stars 533 forks source link

`--headless` option and `--disable-redirects` option do not capture 301 response codes #734

Closed ErikOwen closed 3 months ago

ErikOwen commented 5 months ago

katana version: v1.0.5

Current Behavior:

When I use the --headless and --disable-redirects options to crawl a target site that redirects, an error occurs, and a 301 response code is not shown in the output:

> echo "http://chaos.projectdiscovery.io" | katana --silent -d 1 --jsonl --disable-redirects --silent --headless | jq
{
  "timestamp": "2024-01-18T16:54:59.747159-08:00",
  "request": {
    "method": "GET",
    "endpoint": "http://chaos.projectdiscovery.io",
    "raw": "GET / HTTP/1.1\r\nHost: chaos.projectdiscovery.io\r\nUser-Agent: Go-http-client/1.1\r\nAccept-Encoding: gzip\r\n\r\n"
  },
  "error": "[hybrid:RUNTIME] context deadline exceeded <- could not navigate target"
}

But the same command without the --headless flag enabled properly returns a 301 redirect response in the output:

> echo "http://chaos.projectdiscovery.io" | katana --silent -d 1 --jsonl --disable-redirects --silent | jq
{
  "timestamp": "2024-01-18T16:55:44.169506-08:00",
  "request": {
    "method": "GET",
    "endpoint": "http://chaos.projectdiscovery.io",
    "raw": "GET / HTTP/1.1\r\nHost: chaos.projectdiscovery.io\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36\r\nAccept-Encoding: gzip\r\n\r\n"
  },
  "response": {
    "status_code": 301,
    "headers": {
      "cache_control": "max-age=3600",
      "date": "Fri, 19 Jan 2024 00:55:44 GMT",
      "location": "https://chaos.projectdiscovery.io/",
      "vary": "Accept-Encoding",
      "report_to": "{\"endpoints\":[{\"url\":\"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=IyLGkB%2FS7jd3zXpwfzhuHNFQd%2FZ1agBvtMoJx36egQwxaZlrIkKomG92ts2d%2Bw%2BI65323EEyXHYmrh7mezI5C8P4xzr341cTWNYhrVONqRm28bERFvL18p2UEWW1HUnzddSjhrCQI4GQMzA%3D\"}],\"group\":\"cf-nel\",\"max_age\":604800}",
      "nel": "{\"success_fraction\":0,\"report_to\":\"cf-nel\",\"max_age\":604800}",
      "server": "cloudflare",
      "connection": "keep-alive",
      "cf_ray": "847b1d255d802b8d-LAX",
      "x_content_type_options": "nosniff",
      "expires": "Fri, 19 Jan 2024 01:55:44 GMT"
    },
    "technologies": [
      "Cloudflare"
    ],
    "raw": "HTTP/1.1 301 Moved Permanently\r\nTransfer-Encoding: chunked\r\nCache-Control: max-age=3600\r\nCf-Ray: 847b1d255d802b8d-LAX\r\nConnection: keep-alive\r\nDate: Fri, 19 Jan 2024 00:55:44 GMT\r\nExpires: Fri, 19 Jan 2024 01:55:44 GMT\r\nLocation: https://chaos.projectdiscovery.io/\r\nNel: {\"success_fraction\":0,\"report_to\":\"cf-nel\",\"max_age\":604800}\r\nReport-To: {\"endpoints\":[{\"url\":\"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=IyLGkB%2FS7jd3zXpwfzhuHNFQd%2FZ1agBvtMoJx36egQwxaZlrIkKomG92ts2d%2Bw%2BI65323EEyXHYmrh7mezI5C8P4xzr341cTWNYhrVONqRm28bERFvL18p2UEWW1HUnzddSjhrCQI4GQMzA%3D\"}],\"group\":\"cf-nel\",\"max_age\":604800}\r\nServer: cloudflare\r\nVary: Accept-Encoding\r\nX-Content-Type-Options: nosniff\r\n\r\n0\r\n\r\n"
  }
}

Expected Behavior:

Both commands above show a 301 response. Currently only the non-headless command returns a 301 response.

Steps To Reproduce:

  1. Run katana like so: echo "http://chaos.projectdiscovery.io" | katana --silent -d 1 --jsonl --disable-redirects --silent --headless | jq
  2. Notice how there is no 301 redirect response in the output

Anything else:

olearycrew commented 5 months ago

Thanks for opening this issue @ErikOwen - our team will take a look

ErikOwen commented 4 months ago

@olearycrew - any updates on when a bugfix might be prioritized? is katana still being actively developed?

ErikOwen commented 3 months ago

Confirming that #823 fixes this issue. Thank you, @dogancanbakir!