mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.
https://firecrawl.dev
GNU Affero General Public License v3.0
14.29k stars 1.04k forks source link

Firecrawl is blocked by Cloudflare #495

Open VakhoGeoLab opened 1 month ago

VakhoGeoLab commented 1 month ago

Problem Description Firecrawl is a nice tool but it is stopped by the free CloudFlare bot protection service.

Proposed Feature If Firecrawl could bypass cloudflare, it could be unstoppable service.

JakobStadlhuber commented 1 month ago

@VakhoGeoLab do you use the self hosted or the cloud version?

VakhoGeoLab commented 1 month ago

I’ve tried both. Selfhosted from my computer and also used playground for testing and cloudflare bot protection works.

On Mon, Aug 5, 2024 at 2:49 AM Jakob Stadlhuber @.***> wrote:

@VakhoGeoLab https://github.com/VakhoGeoLab do you use the self hosted or the cloud version?

— Reply to this email directly, view it on GitHub https://github.com/mendableai/firecrawl/issues/495#issuecomment-2267846093, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD3UF37AOS2XPN5AV6WOPL3ZP2VWFAVCNFSM6AAAAABL6HTSJOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRXHA2DMMBZGM . You are receiving this because you were mentioned.Message ID: @.***>

nickscamara commented 1 month ago

ccing @tomkosm here

nickscamara commented 1 month ago

@VakhoGeoLab which URLs are getting blocked?

VakhoGeoLab commented 1 month ago

I'm testing https://coinmania.ge a crypto exchange. I'm building the bot for checking crypto prices.

iuliaturc commented 1 month ago

It also gets blocked by Cloudflare when trying to scrape any subpages of https://jax.readthedocs.io/en/latest/

nickscamara commented 1 month ago

@tomkosm can you look into those as soon as you get a chance? tks

tomkosm commented 1 month ago

It also gets blocked by Cloudflare when trying to scrape any subpages of https://jax.readthedocs.io/en/latest/

Hey, I am not able to replicate this, it seems to work for me now. Cloudflare has different triggers for the high protection modes, try again and let me know if you encounter this issue again, also please provide the exact link and the error you got.

tomkosm commented 1 month ago

I'm testing https://coinmania.ge a crypto exchange. I'm building the bot for checking crypto prices.

This website requires cloudflare captcha solving to access, working on it but cant guarantee that we will be able to support it.