kiwilan / php-opds

PHP package to create OPDS feed (Open Publication Distribution System) for eBooks.
https://packagist.org/packages/kiwilan/php-opds
MIT License
6 stars 2 forks source link

multi-byte safe substr() for OPDS summary #48

Closed mikespub closed 4 months ago

mikespub commented 4 months ago

When a book description contains UTF-8 characters (e.g. Chinese description of Sun Tzu's The Art of War), truncating the content for the summary with substr() may cut it off in mid-character, which results in description = false in the OPDS feed when going through the json_encode(...), json_decode(...) sequence in OpdsJsonEngine::addBookEntry()

Example Content:

<div>
<p>“前孙子者,孙子不遗;后孙子者,不遗孙子”。《孙子兵法》又称《孙武兵法》、《孙子兵书》等,是中国古典军事文化遗产中的璀璨瑰宝,是世界三大兵书之一。全书共十三篇,虽然只有五千余言,但内容包罗万象、博大精深,涉及到战争规律、哲理、谋略、政治、经济、外交、天文、地理等各方面内容,堪称古代兵学理论的宝库和集大成者,在世界广为传播,美国西点军校和哈佛商学院高级管理将其作为人才培训的必读教材。</p></div>

Summary with substr():

“前孙子者,孙子不遗;后孙子者,不遗孙子”。《孙子兵法》又称《孙武兵法》、《孙子兵书》等,是中国古典军事文化遗产中的璀璨瑰宝,是世界三大兵书之一。全书共十三篇,虽然只有五千余言,但内容包罗万象、博大精深,涉及到战争规律、哲理、谋略、政治、经济、外交、天文、地理等各方面内容,堪称古代兵学理论的宝库和集大成者,在世界广为传播,美�...

This is invalid UTF-8, which results in $summary = false in OpdsJsonEngine::addBookEntry()

Summary with mb_substr():

“前孙子者,孙子不遗;后孙子者,不遗孙子”。《孙子兵法》又称《孙武兵法》、《孙子兵书》等,是中国古典军事文化遗产中的璀璨瑰宝,是世界三大兵书之一。全书共十三篇,虽然只有五千余言,但内容包罗万象、博大精深,涉及到战争规律、哲理、谋略、政治、经济、外交、天文、地理等各方面内容,堪称古代兵学理论的宝库和集大成者,在世界广为传播,美国西点军校和哈佛商学院高级管理将其作为人才培训的必读教材...

This is valid UTF-8, which works correctly in OpdsJsonEngine::addBookEntry()

codecov[bot] commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.13%. Comparing base (179670f) to head (d783c16). Report is 3 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #48 +/- ## ========================================= Coverage 99.13% 99.13% Complexity 313 313 ========================================= Files 12 12 Lines 1042 1042 ========================================= Hits 1033 1033 Misses 9 9 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

ewilan-riviere commented 4 months ago

Thanks for this PR!