hmerritt / php-imdb-api

PHP IMDB-API that can fetch film data and search results
Apache License 2.0
59 stars 22 forks source link

Malformed UTF-8 characters #4

Closed dhofverberg closed 3 years ago

dhofverberg commented 3 years ago

This seems promising, but I can't seem to get it to work. I keep getting the following when I use your class (installed via Composer):

Fatal error: Uncaught Filebase\Format\EncodingException: json_encode: 'Malformed UTF-8 characters, possibly incorrectly encoded' in /var/www/html/vendor/tmarois/filebase/src/Format/Json.php:29 Stack trace: #0 /var/www/html/vendor/tmarois/filebase/src/Database.php(237): Filebase\Format\Json::encode() #1 /var/www/html/vendor/tmarois/filebase/src/Document.php(58): Filebase\Database->save() #2 /var/www/html/vendor/hmerritt/imdb-api/src/Cache.php(42): Filebase\Document->save() #3 /var/www/html/vendor/hmerritt/imdb-api/src/Imdb.php(120): hmerritt\Cache->add() #4 /var/www/html/demo.php(20): hmerritt\Imdb->film() #5 {main} Next Filebase\Filesystem\SavingException: Can not encode document. in /var/www/html/vendor/tmarois/filebase/src/Database.php:240 Stack trace: #0 /var/www/html/vendor/tmarois/filebase/src/Document.php(58): Filebase\Database->save() #1 /var/www/html/vendor/hmerritt/imdb-api/src/Cache.php(42): Filebase\Document->save() #2 /var/www/html/vendor/hmerritt/imdb-api/src/Imdb.php(120): hmerritt\Cache->add() #3 /var/www/ in /var/www/html/vendor/tmarois/filebase/src/Database.php on line 240

Any idea what this means, or how to resolve it?

hmerritt commented 3 years ago

The problem is with the cache.

You should be able to circumvent the problem by disabling the cache when using the film or search method.

$imdb = new imdb();

$imdb->film("tt0816692", [
    'cache' => false
]);

Are you searching for a specific film with unusual characters in it? - If so, could you send the film so I can investigate more as I couldn't replicate the error you are getting.

dhofverberg commented 3 years ago

Thank you. It works better with cache disabled.

I just used the example from the Readme, so I just used: $imdb->film("tt0816692");

That shouldn't be any particularly unusual characters, but it works as it should with the cache disabled.

However, many search strings that should work returns no results whatsoever; at least with some films - with or without special characters in the title. For example, running this:

$imdb->search("Jag kommer hem igen till jul", [ 'cache' => false, 'category' => 'tt', 'curlHeaders' => ['Accept-Language: sv,en;q=0.9'], ]);

Just returns an empty array: array(3) { ["titles"]=> array(0) { } ["names"]=> array(0) { } ["companies"]=> array(0) { } }

Any idea why? I would expect that search string to at least find the following film, where the title matches the search string exactly: https://www.imdb.com/title/tt10504966/

Using $imdb->film instead of $imdb->search also results in an empty array.

Is this a general problem, or is it only me with this problem?

As some searches work fine, I haven't been able to find any pattern when it returns an empty string or not. For example, searching for "Interstellar" or "Gravity" works, while searching for "Star Wars" or "Star Trek" returns empty arrays.

hmerritt commented 3 years ago

Thank You! - you have stumbled upon a bug which I had completely missed!

Problem

The search string does not get URL encoded.

This means any name with spaces in (E.G. "Stat Wars" or "The Life and Death of Colonel Blimp") will fail.

Solution

Encode the search string using PHP's built-in function urlencode.

I am pushing a new release immediately to address this problem.

hmerritt commented 3 years ago

This new version should fix it without changing any code (v1.0.3 not v1.0.2 - typo)

$ composer update
dhofverberg commented 3 years ago

Thank you, the new version works fine.

A small caveat though: The class expects that the search string is UTF-8 encoded, so it doesn't work well if the script calling it isn't using UTF-8 (assuming the search string contains something not found in plain ASCII); in my case ISO-8859-1. Not a big deal, as I can manually call utf8_encode() on the search string - but then I also need to call utf8_decode() on the results in order for International characters to look alright.

But perhaps something to consider for a future version, or at least mention it in the README. I'm sure I'm not the only one with a site not in UTF-8 (yet)...