tboothman / imdbphp

PHP library for retrieving film and tv information from IMDb
251 stars 84 forks source link

Fixed Recommendations method #267

Closed duck7000 closed 2 years ago

duck7000 commented 2 years ago

Recommendations was not working, this will fix that Added utf8_decode to the title, is this still necessary? Checked that, still needed for some titles

tboothman commented 2 years ago

I can't take that utf8_decode change without an explanation. I get messed up characters with utf8_decode there [title] => WALL�E

262 should have fixed that problem.

duck7000 commented 2 years ago

Mm that is indeed a problem. The recommendations of this movie https://www.imdb.com/title/tt2282757/ does need utf8_decode for this title: Jochem Myjer: Adéhadé Without utf8_decode this title doesn't get the right character encoding.

262 does not solve the problem in this case.

But for other movies it is not needed and will cause problems like your example

So what is the solution?

tboothman commented 2 years ago

I don't know. It could be something to do with the version of php/libxml/dom. There are tests explicitely for this that I added in response to #262 so I would assume the tests will fail on your setup. The tests in master all pass now so it should be easy to see it fail.

The tests pass on whatever versions github actions use for php 7.4 and 7.3 (https://github.com/tboothman/imdbphp/runs/5345331196?check_suite_focus=true) and on my machine.

I have: PHP 7.4.3 (cli) (built: Oct 25 2021 18:20:54) ( NTS ) DOM/XML => enabled DOM/XML API Version => 20031129 libxml Version => 2.9.10

We only really support php 7.3+, but in theory it works on lower versions. We'd stopped testing 5.6 by the time the test from #262 was added so I don't know if it works on that version or not. I think it's reasonable to support whatever php itself are supporting, which currently is 7.4+ https://www.php.net/supported-versions.php

duck7000 commented 2 years ago

I'm using php 7.4.3 and libxml 2.9.10 on standard Ubuntu 20.04 server So this should work if i understand you right? The fix from #262 fixed all other methods fine, except this one and more specific this title i guess

Well we call this one a "incident" then, it does not happen very often i'll admit so yes i'm fine with your choice to leave utf8_decode out.

tboothman commented 2 years ago

You've got the same version of everything as me (although mine's running on WSL for windows). I suspect you're doing something wrong elsewhere but I don't have any information to say either way. If you run the tests on your computer and they pass you're doing something wrong in your application.

Probably shoud've added a thing about contributing somewhere .. but ..

install composer if you've not got it. From the root of this repo:

composer install
composer test
duck7000 commented 2 years ago

Mm you are right! If i call movie_recommendations() outside my program the titles are all as they should be, so there is something inside my program to blame. I'll need to investigate this further on my side

duck7000 commented 2 years ago

we can close this one as well

tboothman commented 2 years ago

Character encodings are an absolute pain in the ass, particularly in PHP as it mostly pretends it has no idea what character encodings are. Usually in PHP it's best to just ignore them but make sure you tell anything external to PHP what you want or are going to send. For example if you're putting this data in a database make sure it knows it's supposed to be storing UTF-8, and if you're outputting to a browser make sure you tell the browser it's UTF-8 (although I think nowerdays it'll assume UTF-8 if you don't say anything)

duck7000 commented 2 years ago

In my program i use smarty framework v2 I think that's the problem. If i use utf8_decode directly on the value inside the smarty tag the problem is gone. Only smarty 3.1 or higher has, according to the smarty website, full encoding support.

So for now it will do the job