tboothman / imdbphp

PHP library for retrieving film and tv information from IMDb
253 stars 84 forks source link

New bio page #303

Open GeorgeFive opened 1 year ago

GeorgeFive commented 1 year ago

I've noticed that this has been going on for a few days now... maybe even a few weeks. Pieces of person data is just randomly not returning. Sometimes I get it, sometimes I don't. Refreshing will possibly give some pieces, possibly not. I don't see a rhyme or reason to it...?

I do not try to grab all data, but of the data I do grab....

Always works: IMDb number Name Image

Sometimes works: Birthday / born location Died date / location / cause Birth name Nicknames

Test case: nm0001032 Name: Success! Birth Name: Success! Born: Failed! Born Location: Failed! Date Of Death: Success! Location Of Death: Success!

Reload....

Name: Success! Birth Name: Failed! Born: Failed! Born Location: Failed! Date Of Death: Failed! Location Of Death: Failed!

Reload....

Name: Success! Birth Name: Success! Born: Failed! Born Location: Failed! Date Of Death: Success! Location Of Death: Success!

Reload....

Name: Success! Birth Name: Success! Born: Success! Born Location: Success! Date Of Death: Success! Location Of Death: Success!

duck7000 commented 1 year ago

Most likely imdb is started to update other parts/pages to the new ui style. The last time they updated a few pages the behavior you describe lasted months (at least for me it did)

Or it is happening because the search function does not always return results. I noticed that if i search the same title shortly after each other repeatably the first or second part of the function is used. Apparently imdb blocks using the same function in a short time.

GeorgeFive commented 1 year ago

They're definitely updating the bio pages. I had a little more time to look into the problem, and I've noticed that there's at least two versions of it...

Version 1 - the class properly grabs everything (old IMDb page) Version 2 - the class grabs name and image properly, but chokes on everything else (that I'm trying to grab) Version 3 (???) - I haven't seen this one live yet, but it's the only thing I can think of to explain the test cases where some data works (ie, birthname) but other data doesn't (birthday) in the same instance.

My regex skills are severely lacking, so I guess I'll leave my observations here and hope someone picks this up?

Thomasdouscha commented 1 year ago

Bio page start to get less and less data. In soon will be null i think. Seriously issue!

image

It was like this before,

image

Thomasdouscha commented 1 year ago

Now there is no issue. It is ok! It was old type of page. When it is revised to new type of page of imdb website. it works well enough.

GeorgeFive commented 1 year ago

It's random which page you'll get. Next time you scan the page, it may not work again.

duck7000 commented 1 year ago

Jep same as the last time they changed imdb website..

GeorgeFive commented 1 year ago

I seem to remember there being some code in place last time to force the old version until the main code was updated. Anyone remember how to do that?

GeorgeFive commented 1 year ago

I did a bit of a crash course in regex to get this going. I've fixed the following functions... they search for either the old bio page or the new one and will work with either. Works at the moment, subject to break whenever...

This can likely be done smarter, but hey, it works....

public function birthname()
{
    if (empty($this->birth_name)) {
        $this->getPage("Bio");
        if (preg_match("!Birth Name</td><td>(.*?)</td>\n!m", $this->page["Bio"], $match)) {
            $this->birth_name = trim($match[1]);
        } elseif (preg_match('|Birth name","htmlContent":"(.*?)"}|ims', $this->page["Bio"], $match)) {
            $this->birth_name = trim($match[1]);
        }
    }
    return $this->birth_name;
}

public function nickname()
{
    if (empty($this->nick_name)) {
        $this->getPage("Bio");
        if (preg_match("!Nicknames</td>\s*<td>\s*(.*?)</td>\s*</tr>!ms", $this->page["Bio"], $match)) {
            $nicks = explode("<br>", $match[1]);
            foreach ($nicks as $nick) {
                $nick = trim($nick);
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        } elseif (preg_match('!Nickname</td><td>\s*([^<]+)\s*</td>!', $this->page["Bio"], $match)) {
            $this->nick_name[] = trim($match[1]);
        } elseif (preg_match('/Nicknames","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
            $nicks = explode(",", $match[1]);
            foreach ($nicks as $nick) {
                if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
                    $nick = trim($match[1]);
                }
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        }
    }
    return $this->nick_name;
}

public function born()
{
    if (empty($this->birthday)) {
        if (preg_match('|Born</td>(.*)</td|iUms', $this->getPage("Bio"), $match)) {
            preg_match('|/search/name\?birth_monthday=(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymon);
            preg_match('|/search/name\?birth_year=(\d{4})|ims', $match[1], $dyear);
            preg_match('|/search/name\?birth_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
            $this->birthday = array(
              "day" => @$daymon[2],
              "month" => @$daymon[3],
              "mon" => @$daymon[1],
              "year" => @$dyear[1],
              "place" => @$dloc[1]
            );
        } elseif (preg_match('|Born</span>(.*)</div></div></div></li>|iUms', $this->getPage("Bio"), $match)) {
            preg_match('|/search/name/\?birth_monthday=(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymon);
            preg_match('|/search/name/\?birth_year=(\d{4})|ims', $match[1], $dyear);
            preg_match('|/search/name/\?birth_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
            $this->birthday = array(
              "day" => @$daymon[2],
              "month" => @$daymon[3],
              "mon" => @$daymon[1],
              "year" => @$dyear[1],
              "place" => @$dloc[1]
            );
        }

    }
    return $this->birthday;
}

public function died()
{
    if (empty($this->deathday)) {
        $page = $this->getPage("Bio");
        if (preg_match('|Died</td>(.*?)</td|ims', $page, $match)) {
            preg_match('|/search/name\?death_date=(\d+)-(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymonyear);
            preg_match('|/search/name\?death_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
            preg_match('/\(([^\)]+)\)/ims', $match[1], $dcause);
            $this->deathday = array(
              "day" => @$daymonyear[3],
              "month" => @$daymonyear[4],
              "mon" => @$daymonyear[2],
              "year" => @$daymonyear[1],
              "place" => @trim(strip_tags($dloc[1])),
              "cause" => @$dcause[1]
            );
        } elseif (preg_match('|Died</span>(.*)</div></div></div></li>|iUms', $this->getPage("Bio"), $match)) {
            preg_match('|/search/name/\?death_date=(\d+)-(\d+)-(\d+).*?\n?>(.*?) \d+<|', $match[1], $daymonyear);
            preg_match('|/search/name/\?death_date=(\d{4})|ims', $match[1], $dyear);
            preg_match('|/search/name/\?death_place=.*?"\s*>(.*?)<|ims', $match[1], $dloc);
            preg_match('/\(([^\)]+)\)/ims', $match[1], $dcause);
            $this->deathday = array(
              "day" => @$daymonyear[3],
              "month" => @$daymonyear[4],
              "mon" => @$daymonyear[2],
              "year" => @$daymonyear[1],
              "place" => @trim(strip_tags($dloc[1])),
              "cause" => @$dcause[1]
            );
        }
    }
    return $this->deathday;
}
GeorgeFive commented 1 year ago

It's possible that a nickname may have quotes or a comma in it on IMDb. These would break the function. So....

public function nickname()
{
    if (empty($this->nick_name)) {
        $this->getPage("Bio");
        if (preg_match("!Nicknames</td>\s*<td>\s*(.*?)</td>\s*</tr>!ms", $this->page["Bio"], $match)) {
            $nicks = explode("<br>", $match[1]);
            $nicks = str_replace('\\"', "", $nicks);
            foreach ($nicks as $nick) {
                $nick = trim($nick);
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        } elseif (preg_match('!Nickname</td><td>\s*([^<]+)\s*</td>!', $this->page["Bio"], $match)) {
            $match[1] = str_replace('\\"', "", $match[1]);
            $this->nick_name[] = trim($match[1]);
        } elseif (preg_match('/Nicknames","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
            $nicks = explode("},{", $match[1]);
            $nicks = str_replace('\\"', "", $nicks);
            foreach ($nicks as $nick) {
                if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
                    $nick = trim($match[1]);
                }
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        } elseif (preg_match('/Nickname","listContent":\\[[^\\]](.*?)\\]\\}/i', $this->page["Bio"], $match)) {
            $nicks = explode("},{", $match[1]);
            $nicks = str_replace('\\"', "", $nicks);
            foreach ($nicks as $nick) {
                if (preg_match('|:"(.*?)"|ims', $nick, $match)) {
                    $nick = trim($match[1]);
                }
                if (!empty($nick)) {
                    $this->nick_name[] = $nick;
                }
            }
        }
    }
    return $this->nick_name;
}
Thomasdouscha commented 1 year ago

It does not work anymore. does not get data of person. Just name.

GeorgeFive commented 1 year ago

I haven't had any problems. Try copying from here again? I may have edited the first post a day or so after I originally posted it.

Thomasdouscha commented 1 year ago

Thanks a lot!

İt does get info of age, birthname , date and place But it doesnot get height , spouses and biography.

Thomasdouscha commented 1 year ago

I haven't had any problems. Try copying from here again? I may have edited the first post a day or so after I originally posted it.

hello firstly thanks about age birthday info. But it does not get height, spouses and biography. Can you help about it?

Thomasdouscha commented 1 year ago

@tboothman what about person class?

Thomasdouscha commented 1 year ago

Person Class is not urgent i think for you :(

jcvignoli commented 1 year ago

Hi @GeorgeFive! Any chance you also worked to update the bio() method? You might have no good skills in regex, but they are better than mines, it seems!

Thomasdouscha commented 1 year ago

Hi @GeorgeFive i also need your asistance for the bio method spouse and height. Please if you have time ...

Thomasdouscha commented 1 year ago

@Thomasdouscha , @jcvignoli Take a look at my repo, i added back person and personSearch class https://github.com/duck7000/imdbphp6 I discussed it with GeorgeFive and agreed to add it back

Wonderful, tonight i am gonna check it !

Thomasdouscha commented 1 year ago

@duck7000 Unfortunately it does not work. Because of differences arrays. $this->spouses[] = array( 'imdb' => $mid, 'name' => $name, 'from' => $from, 'to' => $to, 'comment' => $comment, 'children' => (int)$children ); this is what i have. and yours, $this->spouses[] = array( 'imdb' => $imdbId, 'name' => $name, 'from' => $fromDate, 'to' => $toDate, 'comment' => $comments ); there is an issue of children. One parameter is missing

duck7000 commented 1 year ago

I have combined comment and children because i think it all is a comment. If you want it separated you can do this in your program or use the comment field and remove field children from your program. This is a small issue that you can easily fix yourself.

You have to remember that my version is different/stripped down and i only added most (not all) methods from person class on request as i don't use it myself.

And please don't comment here on methods used in my version, start a new issue at my version. This way comments are mixed up and confusing to others.

Thomasdouscha commented 1 year ago

I already tried to fix as you said. But i had another new issues and gave up. Yes you are right. İ wil make a comment next time in your page. And i know your version you coded for yourself. You support us many times for issues. I appreciated it Thanks a lot!