tboothman / imdbphp

PHP library for retrieving film and tv information from IMDb
247 stars 84 forks source link

Miscellaneous Sites not parsing / different Markup #316

Open fuxifex opened 1 year ago

fuxifex commented 1 year ago

Description

miscsites() returns an empty array

Movies / TV-Shows / Person

tt0479884 Crank

Type

Bug: the page section "Miscellaneous Sites" is not being parsed correctly - the function miscsites() returns an empty array. In Title.php #2325: "!<h4 class=\"li_group\">$title\s\s(.+?)<(h4|div)!ims" ^^ The Regex Pattern needs to be updated as the markup in the Document is different: <h3 class="ipc-title__text"><span id="misc">Miscellaneous Sites</span> ...

Please note: the markup also contains a JSON which seems to provide all Infos aswell, so you don't have to parse HTML: {"id":"misc","name":"Miscellaneous Sites","section":{"items":[{"id":"http://www.abandomoviez.net/db/pelicula.php?film=13399","rowTitle":"Abandomoviez.net","rowLink":"http://www.abandomoviez.net/db/pelicula.php?film=13399","listContent":[{"text":"Spanish"}],"rowLinkType":"external","refTagSuffix":"msc_os_0"},{"id":"http://www.aceshowbiz.com/movie/crank/","rowTitle":"AceShowbiz.com","rowLink":"http://www.aceshowbiz.com/movie/crank/","rowLinkType":"external","refTagSuffix":"msc_os_1"},{"id":"https://www.aveleyman.com/FilmCredit.aspx? ....

Code

// Avoid posting hundreds of lines of source code.
// Edit to just the relevant portions.

Expected Results / What do you want to do?

Actual Results / What is happening?

fuxifex commented 3 months ago

i can confirm this problem - it even seems to load content by ajax(?) for example - with "Terminator": https://www.imdb.com/title/tt0088247/externalsites/?ref_=tt_ql_dts_5#misc I want to grab the Wikipedia URL but it doesn't even show up in the browser's page source .. thanks