Open bobkingdom opened 2 months ago
如题,换成马蜂窝的爬虫也似乎没爬到任何东西,这个要怎么用呀? 2024-09-01 23:47:53,010 - INFO - HTTP Request: GET https://www.mafengwo.cn/mdd "HTTP/1.1 301 Moved Permanently" [ERROR][2024-09-01 23:47:53][main.py:439] - Error occurred while crawling: '__jsluid_s' INFO: 127.0.0.1:53710 - "POST /fetch_mfw HTTP/1.1" 200 OK
@app.post("/fetch_mfw") async def crawl_mafengwo_mdd(): url = "https://www.mafengwo.cn/mdd" # proxy_gene_func = MyProxy() # config = SpiderConfig(proxy_gene_func=proxy_gene_func) config = SpiderConfig() # 使用 XiaoHongShuSpider spider = MaFengWoSpider(config) try: # 使用异步方法抓取网页内容 doc = await spider.a_crawl(url) logger.info(f"Successfully crawled content: {doc.page_content}") return doc.page_content except Exception as e: logger.error(f"Error occurred while crawling: {str(e)}") return {"error": str(e)}
新增了小红书的demo 在tests里面可以看一下
如题,换成马蜂窝的爬虫也似乎没爬到任何东西,这个要怎么用呀? 2024-09-01 23:47:53,010 - INFO - HTTP Request: GET https://www.mafengwo.cn/mdd "HTTP/1.1 301 Moved Permanently" [ERROR][2024-09-01 23:47:53][main.py:439] - Error occurred while crawling: '__jsluid_s' INFO: 127.0.0.1:53710 - "POST /fetch_mfw HTTP/1.1" 200 OK