Open GeorgeFive opened 1 year ago
Most likely imdb is started to update other parts/pages to the new ui style. The last time they updated a few pages the behavior you describe lasted months (at least for me it did)
Or it is happening because the search function does not always return results. I noticed that if i search the same title shortly after each other repeatably the first or second part of the function is used. Apparently imdb blocks using the same function in a short time.
They're definitely updating the bio pages. I had a little more time to look into the problem, and I've noticed that there's at least two versions of it...
Version 1 - the class properly grabs everything (old IMDb page) Version 2 - the class grabs name and image properly, but chokes on everything else (that I'm trying to grab) Version 3 (???) - I haven't seen this one live yet, but it's the only thing I can think of to explain the test cases where some data works (ie, birthname) but other data doesn't (birthday) in the same instance.
My regex skills are severely lacking, so I guess I'll leave my observations here and hope someone picks this up?
Bio page start to get less and less data. In soon will be null i think. Seriously issue!
It was like this before,
Now there is no issue. It is ok! It was old type of page. When it is revised to new type of page of imdb website. it works well enough.
It's random which page you'll get. Next time you scan the page, it may not work again.
Jep same as the last time they changed imdb website..
I seem to remember there being some code in place last time to force the old version until the main code was updated. Anyone remember how to do that?
I did a bit of a crash course in regex to get this going. I've fixed the following functions... they search for either the old bio page or the new one and will work with either. Works at the moment, subject to break whenever...
This can likely be done smarter, but hey, it works....
public function birthname()
{
if (empty($this->birth_name)) {
$this->getPage("Bio");
if (preg_match("!Birth Name</td><td>(.*?)</td>\n!m", $this->page["Bio"], $match)) {
$this->birth_name = trim($match[1]);
} elseif (preg_match('|Birth name","htmlContent":"(.*?)"}|ims', $this->page["Bio"], $match)) {
$this->birth_name = trim($match[1]);
}
}
return $this->birth_name;
}
public function nickname()
{
if (empty($this->nick_name)) {
$this->getPage("Bio");
if (preg_match("!Nicknames</td>\s*<td>\s*(.*?)</td>\s*</tr>!ms", $this->page["Bio"], $match)) {
$nicks = explode("<br>", $match[1]);
foreach ($nicks as $nick) {
$nick = trim($nick);
if (!empty($nick)) {
$this->nick_name[] = $nick;
}
}
} elseif (preg_match('!Nickname</td><td>\s*([^<]+)\s*</td>!', $this->page["Bio"], $match)) {
$this->nick_name[] = trim($match[1]);
} elseif (preg_match('/Nicknames","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
$nicks = explode(",", $match[1]);
foreach ($nicks as $nick) {
if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
$nick = trim($match[1]);
}
if (!empty($nick)) {
$this->nick_name[] = $nick;
}
}
}
}
return $this->nick_name;
}
public function born()
{
if (empty($this->birthday)) {
if (preg_match('|Born</td>(.*)</td|iUms', $this->getPage("Bio"), $match)) {
preg_match('|/search/name\?birth_monthday=(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymon);
preg_match('|/search/name\?birth_year=(\d{4})|ims', $match[1], $dyear);
preg_match('|/search/name\?birth_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
$this->birthday = array(
"day" => @$daymon[2],
"month" => @$daymon[3],
"mon" => @$daymon[1],
"year" => @$dyear[1],
"place" => @$dloc[1]
);
} elseif (preg_match('|Born</span>(.*)</div></div></div></li>|iUms', $this->getPage("Bio"), $match)) {
preg_match('|/search/name/\?birth_monthday=(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymon);
preg_match('|/search/name/\?birth_year=(\d{4})|ims', $match[1], $dyear);
preg_match('|/search/name/\?birth_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
$this->birthday = array(
"day" => @$daymon[2],
"month" => @$daymon[3],
"mon" => @$daymon[1],
"year" => @$dyear[1],
"place" => @$dloc[1]
);
}
}
return $this->birthday;
}
public function died()
{
if (empty($this->deathday)) {
$page = $this->getPage("Bio");
if (preg_match('|Died</td>(.*?)</td|ims', $page, $match)) {
preg_match('|/search/name\?death_date=(\d+)-(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymonyear);
preg_match('|/search/name\?death_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
preg_match('/\(([^\)]+)\)/ims', $match[1], $dcause);
$this->deathday = array(
"day" => @$daymonyear[3],
"month" => @$daymonyear[4],
"mon" => @$daymonyear[2],
"year" => @$daymonyear[1],
"place" => @trim(strip_tags($dloc[1])),
"cause" => @$dcause[1]
);
} elseif (preg_match('|Died</span>(.*)</div></div></div></li>|iUms', $this->getPage("Bio"), $match)) {
preg_match('|/search/name/\?death_date=(\d+)-(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymonyear);
preg_match('|/search/name/\?death_date=(\d{4})|ims', $match[1], $dyear);
preg_match('|/search/name/\?death_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
preg_match('/\(([^\)]+)\)/ims', $match[1], $dcause);
$this->deathday = array(
"day" => @$daymonyear[3],
"month" => @$daymonyear[4],
"mon" => @$daymonyear[2],
"year" => @$daymonyear[1],
"place" => @trim(strip_tags($dloc[1])),
"cause" => @$dcause[1]
);
}
}
return $this->deathday;
}
It's possible that a nickname may have quotes or a comma in it on IMDb. These would break the function. So....
public function nickname()
{
if (empty($this->nick_name)) {
$this->getPage("Bio");
if (preg_match("!Nicknames</td>\s*<td>\s*(.*?)</td>\s*</tr>!ms", $this->page["Bio"], $match)) {
$nicks = explode("<br>", $match[1]);
$nicks = str_replace('\\"', "", $nicks);
foreach ($nicks as $nick) {
$nick = trim($nick);
if (!empty($nick)) {
$this->nick_name[] = $nick;
}
}
} elseif (preg_match('!Nickname</td><td>\s*([^<]+)\s*</td>!', $this->page["Bio"], $match)) {
$match[1] = str_replace('\\"', "", $match[1]);
$this->nick_name[] = trim($match[1]);
} elseif (preg_match('/Nicknames","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
$nicks = explode("},{", $match[1]);
$nicks = str_replace('\\"', "", $nicks);
foreach ($nicks as $nick) {
if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
$nick = trim($match[1]);
}
if (!empty($nick)) {
$this->nick_name[] = $nick;
}
}
} elseif (preg_match('/Nickname","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
$nicks = explode("},{", $match[1]);
$nicks = str_replace('\\"', "", $nicks);
foreach ($nicks as $nick) {
if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
$nick = trim($match[1]);
}
if (!empty($nick)) {
$this->nick_name[] = $nick;
}
}
}
}
return $this->nick_name;
}
It does not work anymore. does not get data of person. Just name.
I haven't had any problems. Try copying from here again? I may have edited the first post a day or so after I originally posted it.
Thanks a lot!
İt does get info of age, birthname , date and place But it doesnot get height , spouses and biography.
I haven't had any problems. Try copying from here again? I may have edited the first post a day or so after I originally posted it.
hello firstly thanks about age birthday info. But it does not get height, spouses and biography. Can you help about it?
@tboothman what about person class?
Person Class is not urgent i think for you :(
Hi @GeorgeFive! Any chance you also worked to update the bio() method? You might have no good skills in regex, but they are better than mines, it seems!
Hi @GeorgeFive i also need your asistance for the bio method spouse and height. Please if you have time ...
@Thomasdouscha , @jcvignoli Take a look at my repo, i added back person and personSearch class https://github.com/duck7000/imdbphp6 I discussed it with GeorgeFive and agreed to add it back
Wonderful, tonight i am gonna check it !
@duck7000 Unfortunately it does not work. Because of differences arrays. $this->spouses[] = array( 'imdb' => $mid, 'name' => $name, 'from' => $from, 'to' => $to, 'comment' => $comment, 'children' => (int)$children ); this is what i have. and yours, $this->spouses[] = array( 'imdb' => $imdbId, 'name' => $name, 'from' => $fromDate, 'to' => $toDate, 'comment' => $comments ); there is an issue of children. One parameter is missing
I have combined comment and children because i think it all is a comment. If you want it separated you can do this in your program or use the comment field and remove field children from your program. This is a small issue that you can easily fix yourself.
You have to remember that my version is different/stripped down and i only added most (not all) methods from person class on request as i don't use it myself.
And please don't comment here on methods used in my version, start a new issue at my version. This way comments are mixed up and confusing to others.
I already tried to fix as you said. But i had another new issues and gave up. Yes you are right. İ wil make a comment next time in your page. And i know your version you coded for yourself. You support us many times for issues. I appreciated it Thanks a lot!
I've noticed that this has been going on for a few days now... maybe even a few weeks. Pieces of person data is just randomly not returning. Sometimes I get it, sometimes I don't. Refreshing will possibly give some pieces, possibly not. I don't see a rhyme or reason to it...?
I do not try to grab all data, but of the data I do grab....
Always works: IMDb number Name Image
Sometimes works: Birthday / born location Died date / location / cause Birth name Nicknames
Test case: nm0001032 Name: Success! Birth Name: Success! Born: Failed! Born Location: Failed! Date Of Death: Success! Location Of Death: Success!
Reload....
Name: Success! Birth Name: Failed! Born: Failed! Born Location: Failed! Date Of Death: Failed! Location Of Death: Failed!
Reload....
Name: Success! Birth Name: Success! Born: Failed! Born Location: Failed! Date Of Death: Success! Location Of Death: Success!
Reload....
Name: Success! Birth Name: Success! Born: Success! Born Location: Success! Date Of Death: Success! Location Of Death: Success!