Open gazpachoking opened 10 years ago
Did a bit of testing, looks like tmdb is expecting utf-8 encoding. Did a bit of a hack to get things working again:
# Before. Broken
>>> tmdb3.tmdb_api.searchMovie(u'Generation П')[0]
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "build\bdist.win32\egg\tmdb3\tmdb_api.py", line 128, in searchMovie
return MovieSearchResult(Request('search/movie', **kwargs), locale=locale)
File "build\bdist.win32\egg\tmdb3\request.py", line 70, in __init__
kwargs[k] = locale.encode(v)
File "build\bdist.win32\egg\tmdb3\locales.py", line 110, in encode
return dat.encode(self.encoding)
File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u041f' in position 11: character maps to <undefined>
# Hack to fix encoding
>>> tmdb3.locales.set_locale("en", "us", True)
>>> tmdb3.locales.syslocale.encoding = 'utf-8'
# After. Working.
>>> tmdb3.tmdb_api.searchMovie(u'Generation П')[0]
<Movie 'Generation P' (2011)>
If the user is going to be accessing unicode content, such as movies with the character "П" in the title, it expects the user will have configured their system to handle unicode content. Specifically, that means configuring a UTF language in their environment.
# unconfigured default > locale LANG= LC_CTYPE="C" LC_COLLATE="C" LC_TIME="C" LC_NUMERIC="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL= # Bourne users > export LANG="en_US.UTF-8" # C-shell users > setenv LANG en_US.UTF-8 # confirmation > locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_ALL=
The tmdb3 library will then pull that encoding from the environment using the locale library.
> projects/pytmdb3/scripts/pytmdb3.py PyTMDB3 Interactive Shell. TAB completion available. >>> import locale >>> locale.getdefaultlocale() ('en_US', 'UTF-8') >>> get_locale().encoding 'UTF-8'
The problem is, we can't just pick an arbitrary encoding when sending requests to tmdb. They are expecting utf-8.
It has nothing to do with the platform we are running on what encoding the api expects.
Here is some more evidence that just picking a codec that supports all unicode codepoints still isn't correct. It has to be in the encoding tmdb is expecting in order for it to be able to decode again:
>>> tmdb3.locales.syslocale.encoding = 'utf-8'
>>> tmdb3.tmdb_api.searchMovie(u'Generation П')[0]
<Movie 'Generation P' (2011)>
>>> tmdb3.locales.syslocale.encoding = 'utf-16'
>>> tmdb3.tmdb_api.searchMovie(u'Generation П')[0]
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\tmdb_api.py", line 128, in searchMovie
return MovieSearchResult(Request('search/movie', **kwargs), locale=locale)
File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\tmdb_api.py", line 157, in __init__
lambda x: Movie(raw=x, locale=locale))
File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\pager.py", line 106, in __init__
super(PagedRequest, self).__init__(self._getpage(1), 20)
File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\pager.py", line 59, in __init__
self._data = list(iterable)
File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\pager.py", line 110, in _getpage
res = req.readJSON()
File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\cache.py", line 118, in __call__
data = self.func(*args, **kwargs)
File "C:\Users\chase.sterling\PycharmProjects\Flexget\lib\site-packages\tmdb3\request.py", line 125, in readJSON
raise e
TMDBHTTPError: HTTP Error 500: Internal Server Error
The environment does need to be configured for unicode to receive unicode responses from TMDb, due to the behavior of Python 2 itself, however I'll need to look at this again to figure out how to handle non-bytecode encodings.
This should be entirely independent of the environment. Unicode is unicode no matter what locale an user has set. Tmdb declares what encoding they accept and send for byte strings, and the python library should only expose and accept strings as unicode
objects to the user. If the user tries to query the library with a bytestring (str, python 2) representing non-ascii characters is the only time an error should be raised.
tmdb3.locales.syslocale.encoding = 'utf-8'
fixed also TMDbError Internal error - Something went wrong. Contact TMDb.
on tmdb3.MovieSearch('some string with äüö')
Thanks @gazpachoking !
Perhaps I'm misunderstanding how this is supposed to work, but it looks like all request parameters are encoded using the system locale encoding. (https://github.com/wagnerrp/pytmdb3/blob/master/tmdb3/request.py#L70) This causes problems when the system locale cannot encode all the charaters in the parameters, plus, I have no idea how tmdb is expected to know what encoding you have used to encode the parameters, I suspect it should be using a constant encoding defined by the tmdb api. Portion of a relevant traceback:
Downstream ticket: http://flexget.com/ticket/2392