putdotio / putio-kodi

Official Put.io Kodi addon
GNU General Public License v3.0
11 stars 2 forks source link

Use Unicode for subtitle filenames #17

Closed joelpurra closed 1 month ago

joelpurra commented 1 year ago

Fixes a crash for subtitle names with Unicode characters.

- Subtitles for Put.io videos are, if available, downloaded via the Put.io API. - The API subtitle names may contain characters from the Unicode character set. - The subtitle name from the API is used as the local filename, which Kodi then uses when displaying a list of available subtitles. - Put.io's subtitle paths are handled as `special://temp` Kodi paths. - Special paths are recommended in Kodi, as they should help with cross-platform path-handling. - The special path handling in Kodi's `translatePath` function seems to return an ASCII-encoded string, also for Unicode input strings. - Characters from the extended (superset) Unicode charset are out of range for (7-bit) ASCII. - The Put.io addon crashes for non-ASCII characters in the subtitle name. - Python's standard functions use "dynamic" path encoding handling, based on the input path charset. - Using a Unicode-encoded input path allows Unicode characters. - This fix (re)encodes the ASCII path returned from Kodi `translatePath` function as Unicode. - This fix was developed on, and tested only on, a Debian machine running Kodi v20.1. See - https://en.wikipedia.org/wiki/ASCII - https://en.wikipedia.org/wiki/Unicode - https://docs.python.org/3/library/stdtypes.html - https://kodi.wiki/view/Special_protocol - https://github.com/xbmc/xbmc/blob/20.1-Nexus/xbmc/interfaces/legacy/ModuleXbmcvfs.h#L170-L197 - https://github.com/xbmc/xbmc/blob/20.1-Nexus/xbmc/interfaces/legacy/ModuleXbmcvfs.cpp#L61-L67 - https://github.com/xbmc/xbmc/blob/20.1-Nexus/xbmc/filesystem/SpecialProtocol.h - https://github.com/xbmc/xbmc/blob/20.1-Nexus/xbmc/filesystem/SpecialProtocol.cpp#L108 - https://github.com/xbmc/xbmc/blob/20.1-Nexus/xbmc/filesystem/SpecialProtocol.cpp#L119 --- Related, since it includes special paths and encoding, albeit outdated. In Kodi 19 ~~`xbmc.translatePath()`~~ was replaced by `xbmcvfs.translatePath()`. > [**Requirements for all python addons**](https://kodi.wiki/view/Add-on_rules#Requirements_for_all_python_addons) > > - all `special://` paths should be translated using `xbmc.translatePath("special://foo/bar")` > - `xbmc.translatePath(xbmcaddon.Addon().getAddonInfo('path')).decode('utf-8')` should be used for files included with the add-on > - `xbmc.translatePath(xbmcaddon.Addon().getAddonInfo('profile')).decode('utf-8')` should be used for storing add-on related files --- Example error, fixed in this commit. ```text EXCEPTION Thrown (PythonToCppException) : -->Python callback/script returned the following error<-- - NOTE: IGNORING THIS CAN LEAD TO MEMORY LEAKS! Error Type: Error Contents: 'ascii' codec can't encode character '\xe5' in position 77: ordinal not in range(128) Traceback (most recent call last): File "/home/user/.kodi/addons/plugin.video.putio/main.py", line 217, in main() File "/home/user/.kodi/addons/plugin.video.putio/main.py", line 211, in main play(item=item) File "/home/user/.kodi/addons/plugin.video.putio/main.py", line 157, in play li.setSubtitles(item.subtitles()) ^^^^^^^^^^^^^^^^ File "/home/user/.kodi/addons/plugin.video.putio/resources/lib/putio.py", line 209, in subtitles self._download_subtitle(subtitle_url, subtitle_path) File "/home/user/.kodi/addons/plugin.video.putio/resources/lib/putio.py", line 216, in _download_subtitle with open(special_path_translated, 'wb') as f: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'ascii' codec can't encode character '\xe5' in position 77: ordinal not in range(128) -->End of Python script error report<-- Playlist Player: skipping unplayable item: 0, path [plugin://plugin.video.putio/?action=play&item=1234567890] ```
berkanteber commented 1 month ago

Hello, I can't reproduce this on my machine, but I think you have ASCII as filesystem encoding, right? Can you check with sys.getfilesystemencoding()? Also there should be trying to set locale to and global locale set to logs on startup, what are these?

joelpurra commented 1 month ago

@berkanteber: requested output:

sys.getfilesystemencoding() ```shell python3 -c 'import sys; print(sys.getfilesystemencoding());' ``` ```text utf-8 ```
~/.kodi/temp/kodi.log ```shell grep --ignore-case 'locale' ~/.kodi/temp/kodi.log ``` ```text ... info : global locale set to en_US.UTF-8 ```

In a quick test against a single video file hosted by putio, all I can confirm is that the set of subtitles returned from the api has changed.

  1. At least one subtitle filename has obviously broken encoding: U+00E4 0xE4 ä ä %C3 %A4.
    • It is unclear where this character encoding error stems from; could be from bad filename "input" or bad api "output".
    • By the time kodi parses the string, it may no longer be "fixable".
  2. There are no longer any subtitle filenames with "proper" unicode characters in them.
    • Subtitle filenames may be transliterated (cleaned) by the api; see iconv example.
    • Subtitle filenames may be filtered, such that no subtitles with unicode-characters in the filename are included int the api output.
iconv example ```shell echo 'åäöüÅÄÖÜ' | iconv --from-code 'utf-8' --to-code 'ascii' ``` ``` iconv: illegal input sequence at position 0 ``` ```shell echo 'åäöüÅÄÖÜ' | iconv --from-code 'utf-8' --to-code 'ascii//TRANSLIT' ``` ``` aaouAAOU ```

At this time I cannot know if I have a good or "bad" test video file. If you provide a set of test files with known-good unicode subtitle filenames I can check on my side, but I suspect this is unnecessary.

If the issue (the putio addon crashing) isn't reproducible, perhaps it isn't an issue any longer? Might have been that a recent kodi version or a server-side API change (either by putio or upstream) fixed it.

tommie commented 1 month ago

I was hitting this the other day, also with UTF-8. But I'm on Kodi 19.5, so perhaps it's been fixed upstream (Kodi or LibreELEC).

berkanteber commented 1 month ago

I tried with renaming a video and a subtitle (.srt) file in same folder to some unicode name. \xe5 from the original report is å, so å.mp4 and å.srt. Also tried with some Turkish characters, which were fine as well. Checked with ä and it is also fine for me. These all also returns fine from API.

Adding some print lines, special_path_translated is str. Not ASCII-encoded, and keeps Unicode characters. As far as I can tell, if str is provided to open(), filesystem encoding is used to encode the filename. And from the exception, it looks like it tries to encode as ASCII, so I was expecting your answers to be ASCII as well.

Here in docs, it says:

Today Python is converging on using UTF-8: Python on MacOS has used UTF-8 for several versions, and Python 3.6 switched to using UTF-8 on Windows as well. On Unix systems, there will only be a filesystem encoding. if you’ve set the LANG or LC_CTYPE environment variables; if you haven’t, the default encoding is again UTF-8.


If the issue (the putio addon crashing) isn't reproducible, perhaps it isn't an issue any longer? Might have been that a recent kodi version or a server-side API change (either by putio or upstream) fixed it.

I can't say that because I've never reproduced it. If you still experience it (preferably in the latest Kodi) and the PR fixes it, we can merge it.

berkanteber commented 1 month ago

There is this issue I found where it looks like they played with locale. But I don't know which versions it affects.

Can you both try with the latest Kodi?

If you don't have a file to reproduce:

berkanteber commented 1 month ago

And you can test this PR with:

tommie commented 1 month ago

Upgrading to Kodi 21.1 made my filename work. That also triggered an upgrade from addon 3.0.0 to 3.0.2.

Thanks for looking into it, and sorry I didn't think of trying an upgrade first.

joelpurra commented 1 month ago

Quoting @berkanteber:

Can you both try with the latest Kodi?

Am using Kodi v20.5.0, which seems to be the most recent for Debian 12 on Raspberry Pi.

If you don't have a file to reproduce:

  • Take any video file and rename it to åä.mkv or åä.mp4.
  • Take any subtitle and rename it to åä.srt.
  • Put them in the same folder.

Unicode characters in the "local" putio subtitle filename work correctly both with and without .encode(). No issues, and in particular no crashes.

Edit: there might be some naming issue also in this case. The subtitle file has the correct name on disk, but shows up as "Unknown (External)" in kodi's video player/subtitle selector.


Note that I cannot currently test the upstream case, since the putio api doesn't seem to return unicode subtitle filenames for this test file anymore. (The "crashing" unicode subtitle filenames are all gone, which seems like a hack on some api level, so cannot reproduce.) Would still need known-good test files, with known-good "non-local"/"upstream" unicode subtitle filenames for this, which I do not have.

The addon crashing issue has gone away, but so have perhaps some previously available "international" subtitles ;)

berkanteber commented 1 month ago

There are 3 subtitle sources: OpenSubtitles, folder (test case above), file (subtitle in video file). If local works fine, that means our API handles names well. There may be some changes on OpenSubtitles side as you say, but there aren't any recent changes from our side. (We've migrated to their new API but it has been over a year.)

I'll try to find some of the other test cases. Unknown is probably the language, it's unknown for all folder subtitles. I'll check this too.

berkanteber commented 1 month ago
berkanteber commented 1 month ago

I think we can close this issue since upgrading solved the original crash.

berkanteber commented 1 month ago

Sorry for the late return btw, haven't noticed it.

joelpurra commented 1 month ago

@berkanteber: yes, unicode-related crashes are no longer reproducible in upgraded Kodi. Closing this pull request.

It would be nice to separately confirm and fix potential subtitle issues mentioned here; happy to see that you are already looking into it. Thanks!

berkanteber commented 4 weeks ago

It would be nice to separately confirm and fix potential subtitle issues mentioned here; happy to see that you are already looking into it. Thanks!

3.1.0 has just been approved (for Nexus and above). This should improve subtitle language identification.

joelpurra commented 1 week ago

@berkanteber: found a subtitle (seemingly from opensubtitles) with correct unicode handling. While it's only a single test case, it seems to indicate that unicode support is working correctly also in kodi v20.5.