tboothman / imdbphp

PHP library for retrieving film and tv information from IMDb
253 stars 84 forks source link

Nb votes from recommandations is not valid #190

Closed Machou closed 3 years ago

Machou commented 4 years ago

Nb votes from recommandations is not valid

When i want get nb votes from movie_recommendations() > https://github.com/tboothman/imdbphp/blob/master/src/Imdb/Title.php#L559

var_dump of https://www.imdb.com/title/tt4607112/ :

[0] => Array
    (
        [title] => Tin Star
        [imdbid] => 4607112
        [year] => 2017
        [endyear] => 
        [rating] => 7.3
        [votes] => 9
    )

votes is 9, but but it should be 9 947

jreklund commented 4 years ago

I'm afraid I can't replicate this problem, on that movie. What movie/tv-series did you use movie_recommendations() on? As it never reference itself in the movie recommendation part of the page.

What country are your server located in? As they render numbers etc different depending on what country you are from.

  0 => 
    array (size=6)
      'title' => string 'Fortitude' (length=9)
      'imdbid' => string '3498622' (length=7)
      'year' => string '2015' (length=4)
      'endyear' => string '2018' (length=4)
      'rating' => string '7.4' (length=3)
      'votes' => string '18237' (length=5)
tboothman commented 4 years ago

Pretty sure it's the locale. What do numbers look like where you're from?

It's looking at "Users rated this 8.1/10 (5,504 votes)" and using /([0-9.,]{1,3})\/10\s*\(([0-9\s,]+)/i so it's not going to accept dots i guess .. so if it said 5.504 it wouldn't work.

Machou commented 4 years ago

Pretty sure it's the locale. What do numbers look like where you're from?

It's looking at "Users rated this 8.1/10 (5,504 votes)" and using /([0-9.,]{1,3})\/10\s*\(([0-9\s,]+)/i so it's not going to accept dots i guess .. so if it said 5.504 it wouldn't work.

Yes is in fr with $config !

I will test tomorrow thanks

Machou commented 4 years ago

Don't work with the fix.

I'm in fr_FR in $config

My text is : title="Users rated this 8,8/10 (1 908 206 votes) - click stars to rate"

work with :

if (preg_match('/([0-9.,]{1,3})\/10(.*)/is', $cell->parentNode->getElementsByTagName('div')->item(3)->getAttribute('title'),
$rating)) {
$movie['rating'] = str_replace(',', '.', $rating[1]);
$movie['votes'] = preg_replace('/[^0-9]/', '', str_replace(' ', '', $rating[2]));
} 
jreklund commented 4 years ago

Can you add var_dump($cell->parentNode->getElementsByTagName('div')->item(3)->getAttribute('title')); exit; before the if statement doing the preg_match and post the result.

Because space are a valid tag.

In fr_FR I get , as a separation: Users rated this 8.7/10 (1,397,779 votes) - click stars to rate

Machou commented 4 years ago
string(59) "Users rated this 5/10 (16 209 votes) - click stars to rate"

string(60) "Users rated this 4,7/10 (8 824 votes) - click stars to rate"

string(61) "Users rated this 5,6/10 (62 255 votes) - click stars to rate"

string(61) "Users rated this 5,7/10 (95 633 votes) - click stars to rate"

string(62) "Users rated this 6,7/10 (111 556 votes) - click stars to rate"

string(62) "Users rated this 6,1/10 (111 934 votes) - click stars to rate"

string(60) "Users rated this 5,1/10 (1 825 votes) - click stars to rate"

string(61) "Users rated this 4,9/10 (24 351 votes) - click stars to rate"

string(61) "Users rated this 5,3/10 (80 280 votes) - click stars to rate"

string(61) "Users rated this 5,3/10 (59 955 votes) - click stars to rate"

string(62) "Users rated this 6,3/10 (134 042 votes) - click stars to rate"

string(61) "Users rated this 5,6/10 (52 895 votes) - click stars to rate"
jreklund commented 4 years ago

Apparently it's a no-breakable space, and those can only be found if the string are interpreted as unicode.

fr_FR are incorrect, that's why I got the wrong string yesterday. It's fr-FR, so change your configuration.

jreklund commented 4 years ago

@Machou Have you testet it? So that we can close this and make an actual release.

Machou commented 4 years ago

yes don't work ^^

i use my method :

if (preg_match('/([0-9.,]{1,3})\/10(.*)/is', $cell->parentNode->getElementsByTagName('div')->item(3)->getAttribute('title'),
$rating)) {
$movie['rating'] = str_replace(',', '.', $rating[1]);
$movie['votes'] = preg_replace('/[^0-9]/', '', str_replace(' ', '', $rating[2]));
} 
jreklund commented 4 years ago

So this dosen't work? https://github.com/tboothman/imdbphp/commit/c616705f3ca55aca9373455b7be8c7fe0f032cf2 It's a bit modified to the one @tboothman posted earlier.

Machou commented 4 years ago

So this dosen't work? c616705 It's a bit modified to the one @tboothman posted earlier.

no for me (with fr_FR in config)