Closed lyh0825 closed 2 months ago
Without further information, we cannot confirm the slow startup issue. Can you provide an independent example
If it is called like this
crawler = WebCrawler(verbose=True, crawler_strategy=crawler_comm_strategy)
crawler.warmup()
Should be able to print out
[LOG] 🌤️ Warming up the WebCrawler
result xxx
[LOG] 🌞 WebCrawler is ready to crawl
If warmup()
is not manually called, it will not print
But if you have already defaulted to self. ready=True
, then not calling warmup is actually considered to be already started
if self.run(cachepass=true), it will init server so fast,, ok
如果传入 bypass_cache=True,直接拿的缓存数据,那启动肯定快。应该确认一下机器性能和网络速度。
url抓取服务是海外的么, 我在大陆的机器部署使用的
你如果用的是 CloudCrawlerStrategy ,他的服务是海外的,建议直接用LocalSeleniumCrawlerStrategy,完全依赖本地浏览器获取数据会快很多。
我这边调用是很快就拿到结果了,不调用他的服务器。
还有一点慢的原因 就是 使用了 LLMExtractionStrategy ,这个解析器,调用的是外网的服务,解析也会很慢,如果没有调用直接是NoExtractionStrategy 应该没影响,要使用 LLMExtractionStrategy 可以设置成智普Api, openai/glm-4-flash api https://open.bigmodel.cn/api/paas/v4 免费速度快。
好的, 我试下
Hi everyone, @lyh08250 I apologize for missing this issue. Since the 9th of September, we have done tons of changes, one of which is moving the entire library to an asynchronous version – it is much faster, and the performance is significantly better. I hope you can test it again and see the difference. Here is a sample of the code to help you get a quick start.
async def simple_crawl():
print("\n--- Basic Usage ---")
async with AsyncWebCrawler(verbose=True) as crawler:
result = await crawler.arun(url="https://www.nbcnews.com/business")
print(result.markdown[:500]) # Print first 500 characters
when i init crawl4ai server by
it will cost so much time to init server, i want to know why?