zeroday504 / ChonkBreach

Scrape and parse through breachforums.is database leak threads for keywords
GNU General Public License v3.0
2 stars 0 forks source link

cloudfare anti-crawler #1

Open muddlelife opened 1 year ago

muddlelife commented 1 year ago

Hi, as far as I know, breachforums has a cloudflare anti-crawler mechanism, and your script doesn't play a role in getting data

zeroday504 commented 1 year ago

Hey there, thanks for filing the issue and offering that feedback. I noticed this when testing the script recently; at the time of testing it ran successfully but looks like some anti-scraping mechanisms have been implemented. Taking note to work on this.

referefref commented 8 months ago

Might I point you to this: https://github.com/ultrafunkamsterdam/undetected-chromedriver Cloudflare has recently put in detections for puppeteer-extra-stealth so this approach or CDP is what you'll need to bypass the cloudflare protection. That and the change of TLD from is to cx