unclecode / crawl4ai

🔥🕷️ Crawl4AI: Crawl Smarter, Faster, Freely. For AI.
https://crawl4ai.com
Apache License 2.0
17k stars 1.26k forks source link

Howto set header for LLMExtractionStrategy #152

Closed drdsgvo closed 1 month ago

drdsgvo commented 1 month ago

Needed to pass an Authorization header field to the LLM service, that is run on a own server with proxy/authentication in place. How is that possible?

unclecode commented 1 month ago

Hello @drdsgvo,

Thank you for raising this issue. I understand you need to pass an Authorization header to your LLM service. The LLMExtractionStrategy in Crawl4AI supports this functionality through the extra_args parameter. Here's how you can set custom headers:

  1. When initializing the LLMExtractionStrategy, you can pass the extra_args parameter with your custom headers.
  2. These extra_args are then passed to the underlying litellm wrapper to call the desired LLM. So we basically support whatever extra parameters can be passed to litellm.
  3. One of these extra parameters which you need is the extra_headers parameter, which allows you to set custom headers for the API request.

Here's an example of how to use this feature:

from crawl4ai import AsyncWebCrawler
from crawl4ai.extraction_strategy import LLMExtractionStrategy

async def custom_header_example():
    # Define your custom headers
    custom_headers = {
        "Authorization": "Bearer your-auth-token-here"
    }

    # Initialize the LLMExtractionStrategy with custom headers
    extraction_strategy = LLMExtractionStrategy(
        provider="your-llm-provider",
        api_token="your-api-token",
        extra_args={"extra_headers": custom_headers}
    )

    async with AsyncWebCrawler(verbose=True) as crawler:
        result = await crawler.arun(
            url="https://example.com",
            extraction_strategy=extraction_strategy,
            bypass_cache=True
        )

        print(result.extracted_content)

# Run the example
import asyncio
asyncio.run(custom_header_example())

In this example:

  1. We define custom_headers with your Authorization header.
  2. We pass these headers to the LLMExtractionStrategy using the extra_args parameter.
  3. The extra_args are passed through the extraction process and eventually to the LLM API call.

This approach allows you to set any custom headers required by your LLM service, including Authorization headers for authentication.

Note that the exact header name and format may vary depending on your specific LLM service requirements. Adjust the custom_headers dictionary accordingly.

Also, I suggest you check the litellm library to familiarize yourself with other parameters you may want to send. https://docs.litellm.ai/docs/completion/input

By this week, I will update the library to version 0.3.6, and you can try to pass this extra arguments. If you need any further assistance or have any questions, please don't hesitate to ask! Happy crawling.