`-jsonl` option with `-headless` option results in error

ErikOwen commented 1 year ago

katana version: v1.0.4

Current Behavior: using the `-jsonl` and `-headless` options for a katana crawl results in an error: `[hybrid:RUNTIME] context deadline exceeded <- could not get dom`.

Expected Behavior:

No error should occur, similar to running the same command without the -headless option.

Steps To Reproduce:

Run a katana crawl against a target with the -jsonl and -headless options:

> echo "https://projectdiscovery.io" | katana -silent -d 1 -jsonl -ob -or -headless | jq
{
"timestamp": "2023-10-01T10:47:56.072225-07:00",
"request": {
"method": "GET",
"endpoint": "https://projectdiscovery.io"
},
"error": "[hybrid:RUNTIME] context deadline exceeded <- could not get dom"
}

Notice the error: [hybrid:RUNTIME] context deadline exceeded <- could not get dom

Run the same command without the -headless option:

> echo "https://projectdiscovery.io" | katana -silent -d 1 -jsonl -ob -or | jq
{
"timestamp": "2023-10-01T10:50:08.902251-07:00",
"request": {
"method": "GET",
"endpoint": "https://projectdiscovery.io"
},
"response": {
"status_code": 200,
"headers": {
  "cache_control": "public, max-age=0, must-revalidate",
  "report_to": "{\"endpoints\":[{\"url\":\"https:\\/\\/a.nel.cloudflare.com\\/report\\/v3?s=0Kep5aODKo2qRkrQjl%2FTGOcspCMTVBMeLdtA6Gc4y9E5UkFkUW9QYffz4bUgV6TgXpxudfSsHDctxNKUH3%2B979xOe6kOqAHsxy3n4HDyPkXBK2zjDQAguH0ajFdYfUtZ9I74BUw%3D\"}],\"group\":\"cf-nel\",\"max_age\":604800}",
  "nel": "{\"success_fraction\":0,\"report_to\":\"cf-nel\",\"max_age\":604800}",
  "content_type": "text/html",
  "last_modified": "Sat, 30 Sep 2023 00:06:34 GMT",
  "server_timing": "region;desc=\"us-west-2\", cache;desc=\"cached\", fallback;desc=\"no-fallback\"",
  "vary": "Accept-Encoding",
  "cf_ray": "80f68bd47e26521a-LAX",
  "link": "<https://framerusercontent.com>; rel=\"preconnect\", <https://framerusercontent.com>; rel=\"preconnect\"; crossorigin=\"\"",
  "cf_cache_status": "DYNAMIC",
  "date": "Sun, 01 Oct 2023 17:50:08 GMT",
  "server": "cloudflare",
  "x_content_type_options": "nosniff",
  "strict_transport_security": "max-age=0; preload",
  "connection": "keep-alive"
},
"technologies": [
  "Cloudflare",
  "HSTS"
]
}
}

Note that there are no errors when the -headless option is omitted.

Anything else:

This issue only occurs when crawling specific websites. I can consistently reproduce it when crawling https://projectdiscovery.io and https://www.discover.com. But I am unable to reproduce it crawling other sites like https://www.google.com.

ocervell commented 12 months ago

I ran into the same issue today.

RamanaReddy0M commented 6 months ago

@ErikOwen can you try the latest version(v1.0.5)? It seems working with the latest release.

ErikOwen commented 6 months ago

Hi @RamanaReddy0M, thank you for following up! I tried running the same command to reproduces this error using the latest code in the dev branch, and now I'm seeing some paths show a successful response, and some paths still have the [hybrid:RUNTIME] context deadline exceeded <- could not get dom error. So it seems like progress is being made on this issue, but the issue still persists.

Here is the output from when I ran the command: katana_logs.txt

projectdiscovery / katana