Closed Jaberwwocky closed 1 year ago
I think you scraped too much and your ip-address got blacklisted. It works on my UK-server. Without knowing what kind of protection they have implemented it's hard to work around.
OK, so IP might be a problem, I will try to get a new one. I have been scraping imdb for years, but I got a new IP (new server) last year, so I'm suprised.
I've been able to capture what happens when a script tries to scrape and I'm sending you links to screenshots. Also, I see that there are two javascripts probably responsible for this...
Screenshots: https://baze.si/temp/
Javascripts from imdb page:
`
<script src="[https://fb423e1ef94f.6277d64d.us-east-1.captcha.awswaf.com/fb423e1ef94f/c3382d439950/916d943f6a58/captcha.js](view-source:https://fb423e1ef94f.6277d64d.us-east-1.captcha.awswaf.com/fb423e1ef94f/c3382d439950/916d943f6a58/captcha.js)"></script>`
Thank you!
Just tried running our unit tests and they seem to have changed some "Person" pages by most of the titles seem fine, so I guess you have hit a robot check. How many titles did you try to download? Don't think I will be able to hack that away.
We have the ability to change default_agent
and ip_address
from the config file, it's worth a try.
.........................F.FFEEFFFEF....................F...... 63 / 246 ( 25%)
................................................F.............. 126 / 246 ( 51%)
............................................................... 189 / 246 ( 76%)
..............F.......................................... 246 / 246 (100%)
Thank you, I will try that. I was downloading 5-6 titles per day regulary...
Hello
Since yesterday Imdb has implemented some kind of captcha puzzle "Let's confirm you are human" and scraping doesn't work anymore.
Is it possible to work around or avoid that captcha? Or is imdbphp now obsolete? Thank you