mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
https://firecrawl.dev
GNU Affero General Public License v3.0
17.44k stars 1.27k forks source link

Add Proxy Support for Self-Hosted Firecrawl #698

Closed techUdayMungalpara closed 3 weeks ago

techUdayMungalpara commented 3 weeks ago

Add Proxy Support for Self-Hosted Firecrawl

Problem

Proposed Solution

Implementation

  1. Add ProxyConfig class
  2. Update config file:
    proxy:
     enabled: true
     provider: "zenrow"
     api_key: "your_api_key_here"
  3. Create ProxyClient class for API authentication
  4. Modify HTTP client to use ProxyClient
  5. Add error handling and logging
  6. Write unit tests

Benefits

Future Enhancements

mschfh commented 2 weeks ago

Zenrows/Bright Data offer a scraping browser that handles not just proxy rotation but also Cloudflare/CAPTCHA bypass, etc

I opened a feature request to implement support for remote Playwright instances so those can be used.