Closed xinmans closed 1 year ago
已经伪造了useragent,可以换一个代理IP,我套了个Cloudflare Warp的代理,基本上不会403
已经伪造了useragent,可以换一个代理IP,我套了个Cloudflare Warp的代理,基本上不会403
你那个只伪造了一个,也很容易被封 cloudflare warp代理爬v2ex全站开销多少?
随机useragent可以考虑,实现也比较简单。
cloudflare warp是免费的,不过不能直联。开销0
完成https://github.com/oldshensheep/v2ex_scrapy/commit/7f1a6f1820fa199f3cc0c694e1eb3c4959c9d862
2023-07-06 14:50:53 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.v2ex.com/t/326> (referer: None) 2023-07-06 14:50:53 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.v2ex.com/t/326>: HTTP status code is not handled or not allowed 2023-07-06 14:50:53 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.v2ex.com/t/327> (referer: None) 2023-07-06 14:50:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.v2ex.com/t/327>: HTTP status code is not handled or not allowed 2023-07-06 14:50:54 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.v2ex.com/t/328> (referer: None) 2023-07-06 14:50:54 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.v2ex.com/t/328>: HTTP status code is not handled or not allowed 2023-07-06 14:50:56 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.v2ex.com/t/329> (referer: None) 2023-07-06 14:50:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://www.v2ex.com/t/329>: HTTP status code is not handled or not allowed
建议加一些伪造useragent等逻辑