tboothman / imdbphp

PHP library for retrieving film and tv information from IMDb
251 stars 84 forks source link

Decoding of html entities in title does not work properly #264

Closed paxter closed 2 years ago

paxter commented 2 years ago

Using: https://www.imdb.com/title/tt13950332/

Test code

$imdb = new \Imdb\Title(13950332);
echo $imdb->title();

Getting

While the Rest of Us Die: Secrets of America's Shadow Government

Expecting

While the Rest of Us Die: Secrets of America's Shadow Government

I took a look into the code and I could identify the following line in the title_year() function: https://github.com/tboothman/imdbphp/blob/0074d4cda918e910fd9a43a37b335a6fd9fa294f/src/Imdb/Title.php#L236

To get it working properly I replaced the line with:

$this->main_title = html_entity_decode($match['title'], ENT_QUOTES, 'UTF-8');

I'm not sure if this is the proper or best solution, but it worked in my case. There are some other usages of htmlspecialchars_decode() in that function I have replaced too.

tboothman commented 2 years ago

Imdb seems to be only escaping characters that mean something to html, so I think htmlspecialchars_decode is an appropriate function to use here. Non ascii characters are represented as UTF-8. Seems like a silly mistake in PHP someone made a long time ago probably with good intentions to exclude single quotes from these functions .. it's been fixed very recently though https://php.watch/versions/8.1/html-entity-default-value-changes

php > $a = '&"''';
php > echo htmlspecialchars_decode($a);
&"''
php > echo html_entity_decode($a);
&"''
php > echo html_entity_decode($a, ENT_QUOTES, 'UTF-8');
&"''
php > echo htmlspecialchars_decode($a, ENT_QUOTES);
&"''

Some examples of elements:</p> <pre><code>Nausicaä of the Valley of the Wind Forhøret &quot;Firefly&quot; The Train Job (TV Episode 2002) While the Rest of Us Die: Secrets of America&#x27;s Shadow Government</code></pre> </div> </div> <div class="comment"> <div class="user"> <a rel="noreferrer nofollow" target="_blank" href="https://github.com/paxter"><img src="https://avatars.githubusercontent.com/u/5040710?v=4" />paxter</a> commented <strong> 2 years ago</strong> </div> <div class="markdown-body"> <p>Thanks for your fast reply. Your provided solution is working for me too. 👍 </p> </div> </div> <div class="page-bar-simple"> </div> <div class="footer"> <ul class="body"> <li>© <script> document.write(new Date().getFullYear()) </script> Githubissues.</li> <li>Githubissues is a development platform for aggregating issues.</li> </ul> </div> <script src="https://cdn.jsdelivr.net/npm/jquery@3.5.1/dist/jquery.min.js"></script> <script src="/githubissues/assets/js.js"></script> <script src="/githubissues/assets/markdown.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/highlight.min.js"></script> <script src="https://cdn.jsdelivr.net/gh/highlightjs/cdn-release@11.4.0/build/languages/go.min.js"></script> <script> hljs.highlightAll(); </script> </body> </html>