projectdiscovery / katana

A next-generation crawling and spidering framework.
MIT License
11.27k stars 595 forks source link

Katana will not exclude links with parameters such as js and css from crawling #987

Open CatDrinkCoffee opened 3 months ago

CatDrinkCoffee commented 3 months ago

Normally, the crawler will not request js and css pages once more, but when I used the -sb parameter to observe the browser crawling process, I found that Katana actually had the following problem

For example: .js and .css will not be visited once during the crawling process, but if it is with parameters, such as .js?ver=1.1, the crawler will choose to visit this page once, which will cause a huge number of crawler requests. Now many pages may have a parameter value after the js link. I think this is a defect and hope it can be fixed. Thank you

The picture below is what I captured when the crawler chose to visit this js (this js link is with parameters)

1723303751361

zrquan commented 3 months ago

Have you tried the -igq flag?

CatDrinkCoffee commented 2 months ago

Have you tried the -igq flag?

There is no -igq parameter in the document, but this is not a defect, right? I don't need to use other parameters to circumvent this defect. From the execution process, the program itself is designed not to crawl specific links, but in some special cases, this design fails.

zrquan commented 2 months ago

There is no -igq parameter in the document

Sorry, it should be the -iqp parameter.