unclecode / crawl4ai

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper
Apache License 2.0
12.57k stars 891 forks source link

Language Support #118

Open oaishi opened 1 week ago

oaishi commented 1 week ago

Hi,

Thanks for the great repository. I am new to this repository, I was curious to know if there is any support to change the language before I crawl a certain page?

unclecode commented 1 week ago

Thank you for your interest in language support! While browsers don't directly support changing the language of web content, our library does support setting the Accept-Language header, which many websites use to serve content in different languages.

You can set the language preference in a few ways:

  1. When creating the crawler:

    crawler = AsyncWebCrawler(
       crawler_strategy=AsyncPlaywrightCrawlerStrategy(
           headers={"Accept-Language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7"}
       )
    )
  2. Before crawling:

    crawler.crawler_strategy.headers["Accept-Language"] = "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7"
  3. When calling the arun method:

    result = await crawler.arun(
       url,
       headers={"Accept-Language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7"}
    )

Please note that the effectiveness of this method depends on the website you're crawling and whether it supports serving content in different languages based on the Accept-Language header.

We're also considering adding more language-related features in future updates. Could you provide more details about your specific use case? This would help us prioritize the most useful approaches for our users.

oaishi commented 4 days ago

Thanks so much @unclecode for the suggestion. I will check this out and let you know incase I have any followup questions.