winterbird-code / adbb

Object Oriented UDP Client Library for AniDB
GNU General Public License v3.0
17 stars 4 forks source link

Any success with utf-8 path or filenames? #8

Closed bigretromike closed 7 years ago

bigretromike commented 7 years ago

Having utf-8 characters in names make hell with it; no matter if its python2.7 or 3.6; any success with those ?

for dir_path, dir_names, file_names in os.walk(root):
    for file_name in file_names:
        file = adbb.File(path=(os.path.join(dir_path, file_name)).decode('utf-8'))

love to throw error related with encoding; removing .decode make it worst;

winterbird-code commented 7 years ago

After pushing a fix to db.py this works fine in both python2 and python3 (previously only python3 worked). There should be no need to use encode()/decode() when using the library.

for root, dirs, files in os.walk('/media/Anime/Series/Ranma ½'):
    dirs = []
    for f in files:
        adbb_file = adbb.File(path=os.path.join(root, f))
        print("{} - {}".format(adbb_file.path, adbb_file.episode))

The string handling in python changed a lot between version 2 and 3 and I expect there are many other similar errors in the code... Primary this library is written for python3, but I intend to support python2 as well as long as it's reasonable, so let me know if there are other strange behaviours.

bigretromike commented 7 years ago

Will do; I would rather stick with 2.7 until it get deprecated 👍

bigretromike commented 7 years ago

Yeah, the db.py fix was also in my fix's but I couldn't test it that much;

bigretromike commented 7 years ago
  File "C:\Python27\lib\site-packages\adbb-0.2-py2.7.egg\adbb\animeobjs.py", line 542, in __init__
    path, fid, anime, episode, lid))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 26: ordinal not in range(128)

maybe because file have also utf-8 name :-)

winterbird-code commented 7 years ago

could you please provide a bit more info in your reports? I can't really tell what you were trying to do there.... At the very least I need to know what function you called (with parameters) and the full stack trace.

bigretromike commented 7 years ago

same as before with Ranma ½ in dir and file name

winterbird-code commented 7 years ago

I still need the full stack trace; can't really tell what is wrong with just those lines...

bigretromike commented 7 years ago

tell me what you need and how to get it;

winterbird-code commented 7 years ago

The stack trace is the output you get when it crashes; This line and everything below:

Traceback (most recent call last):
bigretromike commented 7 years ago

path = u'\\\\anime\\_anime\\Ranma ½ TV\\[E-D]_Ranma_½_TV_Ep075_(B330C2B2).mkv' give this:

Traceback (most recent call last):
  File "C:\Users\bigretromike\.IdeaIC2016.3\config\plugins\python\helpers\pydev\pydevd.py", line 1596, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Users\bigretromike\.IdeaIC2016.3\config\plugins\python\helpers\pydev\pydevd.py", line 974, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:/_git/joshu/app.py", line 77, in <module>
    command_scan(path_to_scan)
  File "D:/_git/joshu/app.py", line 53, in command_scan
    _file = adbb.File(path=os.path.join(dir_path, file_name))
  File "C:\Python27\lib\site-packages\adbb-0.2-py2.7.egg\adbb\animeobjs.py", line 542, in __init__
    path, fid, anime, episode, lid))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 26: ordinal not in range(128)
bigretromike commented 7 years ago

This can be fix with changing lines: 542 to: path.encode('utf-8'), fid, anime, episode, lid)) 552: adbb.log.debug("Created File {} - size: {}, mtime: {}".format(self._path.encode('utf-8'), self._size, self._mtime)) 762: self._path.encode('utf-8') 826: self._path.encode('utf-8'),

bigretromike commented 7 years ago

That's all for now 👍

winterbird-code commented 7 years ago

Hmm, I read up a bit on unicode for python2 and python3, because I really don't think this is the correct solution.

Try send your path as a string instead of an unicode string; either by simply not using the u''-syntax or by using .encode('utf-8') on your unicode string. That seem to work for me.

The "correct" way to solve this for python2 seems to be to simply use the u''-syntax on all hardcoded strings in the library; but then you must use unicode strings when using the library. This unicode-thing is one of the main features of python3, and a really good reason to upgrade.

For now I guess you have two options:

  1. Use regular strings instead of unicode strings
  2. Convert all strings in the library to unicode strings

I'm not sure I would merge a PR for that second option... The primary target is python3, and since it works with regular strings I don't think it's worthwhile to add legacy code for this.

bigretromike commented 7 years ago

Ranma ½ TV converted to string/encoded to utf8 is Ranma 1 TV which is non existing folder; or there is a convertion error throwing this:

Traceback (most recent call last):
  File "C:\Users\bigretromike\.IdeaIC2016.3\config\plugins\python\helpers\pydev\pydevd.py", line 1596, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:\Users\bigretromike\.IdeaIC2016.3\config\plugins\python\helpers\pydev\pydevd.py", line 974, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "D:/_git/joshu/app.py", line 79, in <module>
    command_scan(path_to_scan)
  File "D:/_git/joshu/app.py", line 55, in command_scan
    _file = adbb.File(path=os.path.join(dir_path.encode('utf-8'), file_name.encode('utf-8')))
  File "C:\Python27\lib\site-packages\adbb-0.2-py2.7.egg\adbb\animeobjs.py", line 551, in __init__
    self.nfs_obj)
  File "C:\Python27\lib\site-packages\adbb-0.2-py2.7.egg\adbb\fileinfo.py", line 98, in get_file_stats
    stat = os.stat(path)
WindowsError: [Error 3] System nie mo�e odnale�� okre�lonej �cie�ki: '\\\\anime\\_anime\\Ranma \xc2\xbd TV\\[Exiled-Destiny]_Ranma_\xc2\xbd_TV_Ep075_(B330C2B2).mkv'

The windowserror tells: System cannot access given path

winterbird-code commented 7 years ago

It seems to behave differently from my unix/linux systems then :/

Sorry, but this is not something I will fix in this library. As I said; the correct way to support python2 seems to be to convert all strings in use in the library to unicode strings. As that is not needed (and even seems to be discouraged) for python3 I think it is a too big change for backward compatibility. You're of course free to fix this in your own branch if you wish, but I would really recommend migrating to python3 instead...

bigretromike commented 7 years ago

ok - thanks for the input 👍 and previous fix and good word