jhcoco / bosszp

Boss直聘岗位数据爬虫分析可视化
223 stars 30 forks source link

哥爬数据一直不行啊,怎么解决呢 #9

Open fffmmc opened 3 months ago

fffmmc commented 3 months ago

请从上述城市列表中,选择编号开始爬取:1 2024-06-13 12:57:04 [root] INFO: <<<<<<<<<<<<<正在爬取第_1_页岗位数据>>>>>>>>>>>>> 2024-06-13 12:57:51 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min) 2024-06-13 12:58:29 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET https://www.zhipin.com/web/common/security-check.html? seed=sLswXh9BvM6FLmpwIzWsWfpcID5hvF%2Bj7OUnLvfGiBpHpCfbIqDqHgRru0Nnpwfd&name=a8d239c0&ts=1718254709446&callbackUrl=%2Fjob_detail%2F%3Fquery%3D%26ci ty%3D100010000%26industry%3D%26position%3D&srcReferer=https%3A%2F%2Fwww.zhipin.com%2Fc101020100%2F%3Fka%3Dsel-city-101020100> from <GET https://www.zhipin.com/job_detail/?query=&city=100010000&industry=&position=> 2024-06-13 12:58:51 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-06-13 12:59:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.zhipin.com/web/common/security-check.html?seed=sLswXh9BvM6FLmpwIzWsW fpcID5hvF%2Bj7OUnLvfGiBpHpCfbIqDqHgRru0Nnpwfd&name=a8d239c0&ts=1718254709446&callbackUrl=%2Fjob_detail%2F%3Fquery%3D%26city%3D100010000%26industry% 3D%26position%3D&srcReferer=https%3A%2F%2Fwww.zhipin.com%2Fc101020100%2F%3Fka%3Dsel-city-101020100> (referer: https://www.zhipin.com/c101020100/?ka=sel-city-101020100) 2024-06-13 12:59:26 [root] INFO: <<<<<<<<<<<<<正在爬取第_2_页岗位数据>>>>>>>>>>>>> 2024-06-13 12:59:29 [scrapy.dupefilters] DEBUG: Filtered duplicate request: <GET https://www.zhipin.com/web/common/security-check.html?seed=sLswXh9 BvM6FLmpwIzWsWfpcID5hvF%2Bj7OUnLvfGiBpHpCfbIqDqHgRru0Nnpwfd&name=a8d239c0&ts=1718254709446&callbackUrl=%2Fjob_detail%2F%3Fquery%3D%26city%3D1000100 00%26industry%3D%26position%3D&srcReferer=https%3A%2F%2Fwww.zhipin.com%2Fc101020100%2F%3Fka%3Dsel-city-101020100> - no more duplicates will be shown (see DUPEFILTER_DEBUG to show all duplicates) 2024-06-13 12:59:29 [scrapy.core.engine] INFO: Closing spider (finished) 2024-06-13 12:59:29 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 2218, 'downloader/request_count': 3, 'downloader/request_method_count/GET': 3, 'downloader/response_bytes': 10946, 'downloader/response_count': 3, 'downloader/response_status_count/200': 2, 'downloader/response_status_count/302': 1, 'dupefilter/filtered': 1, 'elapsed_time_seconds': 157.977382, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 6, 13, 4, 59, 29, 305362), 'httpcompression/response_bytes': 105815, 'httpcompression/response_count': 2, 'log_count/DEBUG': 5, 'log_count/INFO': 14, 2024-06-13 12:59:29 [scrapy.core.engine] INFO: Spider closed (finished)

jhcoco commented 3 months ago

访问新项目https://github.com/jhcoco/bosszp-selenium