projectdiscovery / katana

A next-generation crawling and spidering framework.
MIT License
10.22k stars 530 forks source link

Cookies in CustomHeaders not correctly used & building altered headers (Hybrid) #930

Open alban-stourbe-wmx opened 2 weeks ago

alban-stourbe-wmx commented 2 weeks ago

katana version:

Katana version: v1.1.0

Current Behavior:

If cookies are added to customHeaders, they are not integrated directly into the browser. As a result, they are used when the first request is sent, but they are not directly added to the browser, so in some cases the information is lost for the rest of the crawling. This problem is very restrictive when trying to perform an authenticated crawling and the authentication vector is a cookie.

What's more, the reconstruction of the request to insert it into the output is based solely on the customHeaders and not on the headers linked to the request sent by the browser. As a result, the request written to the output doesn't really correspond to the request sent by the browser.

Detailed explanation of bug source

If we want to add custom cookies for an authenticated crawl, we need to use option H, CustomHeaders. This data is added to the Headers field of the Shared.

Capture d’écran 2024-06-17 à 09 35 49

Custom headers are then used when crawling a web page. They are added to the headers of the page in question using the Shared addHeadersToPage function. Custom headers are then used when crawling a web page. They are added to the headers of the page in question by the Shared addHeadersToPage function. This function calls page.SetExtraHeaders, which can lead to a bug.

During crawling, when accessing a certain page, there may be a Set-Cookie in the response. A cookie will be initialized in the browser. As a result, even if custom cookies are specified in the option, they will not be added to the page headers, as page.SetExtraHeaders only adds a value if it doesn't exist. In the case of a crawl authenticated via a certain cookie, this value may be lost during the crawl.

For example, During the first crawl, a foo=bar cookie is present, but during the next crawl this information has disappeared because cookies have been initialized. So SetExtraHeaders will not add Cookie because the value is already set.

Capture d’écran 2024-06-17 à 09 57 20 Capture d’écran 2024-06-17 à 09 58 09

In addition, adding headers to recreate the output request does not coincide with the real request sent by the browser. During the crawling, the browser can add dynamically headers and cookies, but the reconstruction is based solely on the custom headers entered as input.

Genuine request:

Capture d’écran 2024-06-17 à 10 10 47

Output request:

Capture d’écran 2024-06-17 à 10 10 06

Expected Behavior:

Create an option to load cookies when the browser is initialized. In hybrid mode, cookies can't simply be added to headers - they have to be inserted into the browser to emulate real browser behavior. The use of cookies and headers must be dissociated in this context.

To rebuild request headers, simply use the headers linked to the hijacked request (e proto....). The latter contains all information, including customHeaders.

Steps To Reproduce:

Example: steps to reproduce the behavior :

  1. Launch katana with a custom cookie in the Custom Headers option
  2. Notice that the cookie value disappears during crawling.
alban-stourbe-wmx commented 2 weeks ago

I've made the changes to fix this issue. I plan to do the PR later today. ;)

GeorginaReeder commented 2 weeks ago

Great, thank you for this @alban-stourbe-wmx - we'll look out for the PR! :)

alban-stourbe-wmx commented 2 weeks ago

Hey everyone, I'm coming back to you because I may have gotten carried away with loading cookies in the browser. 🥹

I wanted to create this cookie loading option to have an authenticated browser. However, I found that it was possible to do this by creating a debug browser and passing it to katana (cwu option). So adding such an option doesn't seem really coherent for the project.

Nevertheless, a bug persists in hybrid request headers. When using a custom/authenticated browser, no associated header is written, only the customHeaders entered as input