tboothman / imdbphp

PHP library for retrieving film and tv information from IMDb
247 stars 84 forks source link

Imdb now verifing humans #287

Closed Jaberwwocky closed 1 year ago

Jaberwwocky commented 1 year ago

Hello

Since yesterday Imdb has implemented some kind of captcha puzzle "Let's confirm you are human" and scraping doesn't work anymore.

Is it possible to work around or avoid that captcha? Or is imdbphp now obsolete? Thank you

jreklund commented 1 year ago

I think you scraped too much and your ip-address got blacklisted. It works on my UK-server. Without knowing what kind of protection they have implemented it's hard to work around.

Jaberwwocky commented 1 year ago

OK, so IP might be a problem, I will try to get a new one. I have been scraping imdb for years, but I got a new IP (new server) last year, so I'm suprised.

I've been able to capture what happens when a script tries to scrape and I'm sending you links to screenshots. Also, I see that there are two javascripts probably responsible for this...

Screenshots: https://baze.si/temp/

Javascripts from imdb page:

`

<script src="[https://fb423e1ef94f.6277d64d.us-east-1.captcha.awswaf.com/fb423e1ef94f/c3382d439950/916d943f6a58/captcha.js](view-source:https://fb423e1ef94f.6277d64d.us-east-1.captcha.awswaf.com/fb423e1ef94f/c3382d439950/916d943f6a58/captcha.js)"></script>`

Thank you!

jreklund commented 1 year ago

Just tried running our unit tests and they seem to have changed some "Person" pages by most of the titles seem fine, so I guess you have hit a robot check. How many titles did you try to download? Don't think I will be able to hack that away.

We have the ability to change default_agent and ip_address from the config file, it's worth a try.

.........................F.FFEEFFFEF....................F......  63 / 246 ( 25%)
................................................F.............. 126 / 246 ( 51%)
............................................................... 189 / 246 ( 76%)
..............F..........................................       246 / 246 (100%)
Jaberwwocky commented 1 year ago

Thank you, I will try that. I was downloading 5-6 titles per day regulary...