Open CatDrinkCoffee opened 3 months ago
Have you tried the -igq
flag?
Have you tried the
-igq
flag?
There is no -igq parameter in the document, but this is not a defect, right? I don't need to use other parameters to circumvent this defect. From the execution process, the program itself is designed not to crawl specific links, but in some special cases, this design fails.
There is no -igq parameter in the document
Sorry, it should be the -iqp
parameter.
Normally, the crawler will not request js and css pages once more, but when I used the -sb parameter to observe the browser crawling process, I found that Katana actually had the following problem
For example: .js and .css will not be visited once during the crawling process, but if it is with parameters, such as .js?ver=1.1, the crawler will choose to visit this page once, which will cause a huge number of crawler requests. Now many pages may have a parameter value after the js link. I think this is a defect and hope it can be fixed. Thank you
The picture below is what I captured when the crawler chose to visit this js (this js link is with parameters)