unclecode / crawl4ai

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
Apache License 2.0
16.38k stars 1.2k forks source link

'CustomHTML2Text' is not defined #243

Closed yassello55 closed 2 weeks ago

yassello55 commented 2 weeks ago

Hi, i was making a test of a basic script and i got this error message : [ERROR] 🚫 Failed to crawl https://openai.com/api/pricing/, error: name 'CustomHTML2Text' is not defined None from crawl4ai import WebCrawler

Create an instance of WebCrawler

crawler = WebCrawler()

Warm up the crawler (load necessary models)

crawler.warmup()

Run the crawler on a URL

result = crawler.run(url="https://openai.com/api/pricing/")

Print the extracted content

print(result.markdown)

unclecode commented 2 weeks ago

@yassello55 You're using the synchronous version which is no longer maintained. I suggest switching to the main asynchronous example to avoid similar errors. Sorry for the inconvenience; try the async version instead.

import asyncio
from crawl4ai import AsyncWebCrawler

async def main():
    # Create an instance of AsyncWebCrawler
    async with AsyncWebCrawler(verbose=True) as crawler:
        # Run the crawler on a URL
        result = await crawler.arun(url="https://www.nbcnews.com/business")

        # Print the extracted content
        print(result.markdown)

# Run the async main function
asyncio.run(main())