henkelis / sonospy

Sonospy is a UPnP control point and Windows Media proxy for the Sonos multi-room audio system.
GNU General Public License v3.0
113 stars 16 forks source link

Problem when scanning folders/files with special characters #70

Open makkus opened 11 years ago

makkus commented 11 years ago

I'm getting an exception (below) whenever I try to scan folders or files with special characters in the name. Any ideas?

I'm using Ubuntu 12.04 & latest unstable branch...

markus@barrelhaven:/opt/sonospy/sonospy$ cat /opt/sonospy/sonospy/errors/ErrorDump-20130302-215728.txt <type 'exceptions.UnicodeDecodeError'> Python 2.7.3: /usr/bin/python Sat Mar 2 21:57:28 2013

A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.

/opt/sonospy/sonospy/gettags.py in () 2527 2528 if name == "main": 2529 status = main() 2530 sys.exit(status) 2531 status undefined main =

/opt/sonospy/sonospy/gettags.py in main(argv=None) 2522 for path in args: 2523 if path.endswith(os.sep): path = path[:-1] 2524 process_dir(path.decode(enc), options, database) 2525 filelog.close_log_files() 2526 return 0 global process_dir = path = '/mnt/music/library/artists' path.decode = global enc = 'UTF-8' options = <Values at 0xa09484c: {'verbose': False, 'databa...quiet': False, 'exclude': None, 'extract': None}> database = '/opt/sonospy/sonospy/artists.db'

/opt/sonospy/sonospy/gettags.py in process_dir(scanpath=u'/mnt/music/library/artists', options=<Values at 0xa09484c: {'verbose': False, 'databa...quiet': False, 'exclude': None, 'extract': None}>, database='/opt/sonospy/sonospy/artists.db') 254 255 visitedpaths = [] 256 for filepath, dirs, files in os.walk(scanpath, followlinks=follow_symlinks): 257 258 filepath = os.path.abspath(os.path.realpath(filepath)) filepath undefined dirs undefined files undefined global os = <module 'os' from '/usr/lib/python2.7/os.pyc'> os.walk = scanpath = u'/mnt/music/library/artists' followlinks undefined global follow_symlinks = False

/usr/lib/python2.7/os.py in walk(top=u'/mnt/music/library/artists', topdown=True, onerror=None, followlinks=False) 282 dirs, nondirs = [], [] 283 for name in names: 284 if isdir(join(top, name)): 285 dirs.append(name) 286 else: isdir = join = top = u'/mnt/music/library/artists' name = 'A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble'

/usr/lib/python2.7/posixpath.py in join(a=u'/mnt/music/library/artists', *p=('A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble',)) 69 path += b 70 else: 71 path += '/' + b 72 return path 73 path = u'/mnt/music/library/artists' b = 'A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble' <type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't decode byte 0xe1 in position 38: ordinal not in range(128) class = <type 'exceptions.UnicodeDecodeError'> delattr = <method-wrapper 'delattr' of exceptions.UnicodeDecodeError object> dict = {} doc = 'Unicode decoding error.' format = getattribute = <method-wrapper 'getattribute' of exceptions.UnicodeDecodeError object> getitem = <method-wrapper 'getitem' of exceptions.UnicodeDecodeError object> getslice = <method-wrapper 'getslice' of exceptions.UnicodeDecodeError object> hash = <method-wrapper 'hash' of exceptions.UnicodeDecodeError object> init = <method-wrapper 'init' of exceptions.UnicodeDecodeError object> new = reduce = reduce_ex = repr = <method-wrapper 'repr' of exceptions.UnicodeDecodeError object> setattr = <method-wrapper 'setattr' of exceptions.UnicodeDecodeError object> setstate = sizeof = str = <method-wrapper 'str' of exceptions.UnicodeDecodeError object> subclasshook = unicode = args = ('ascii', '/A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble', 38, 39, 'ordinal not in range(128)') encoding = 'ascii' end = 39 message = '' object = '/A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble' reason = 'ordinal not in range(128)' start = 38

The above is a description of an error in a Python program. Here is the original traceback:

Traceback (most recent call last): File "./gettags.py", line 2529, in status = main() File "./gettags.py", line 2524, in main process_dir(path.decode(enc), options, database) File "./gettags.py", line 256, in process_dir for filepath, dirs, files in os.walk(scanpath, followlinks=follow_symlinks): File "/usr/lib/python2.7/os.py", line 284, in walk if isdir(join(top, name)): File "/usr/lib/python2.7/posixpath.py", line 71, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 38: ordinal not in range(128)

henkelis commented 11 years ago

It sounds like Python doesn't know to expect utf8 from your filesystem.

What locale do you have set:

[mark@mark sonospy]$ locale
LANG=en_GB.utf8
LC_CTYPE="en_GB.utf8"
LC_NUMERIC="en_GB.utf8"
LC_TIME="en_GB.utf8"
LC_COLLATE="en_GB.utf8"
LC_MONETARY="en_GB.utf8"
LC_MESSAGES="en_GB.utf8"
LC_PAPER="en_GB.utf8"
LC_NAME="en_GB.utf8"
LC_ADDRESS="en_GB.utf8"
LC_TELEPHONE="en_GB.utf8"
LC_MEASUREMENT="en_GB.utf8"
LC_IDENTIFICATION="en_GB.utf8"
LC_ALL=

and what do you get when you run the following in python:

[mark@mark sonospy]$ python
>>> import sys
>>> sys.getfilesystemencoding()
'UTF-8'
>>> quit()
makkus commented 11 years ago

Hi,

I think UTF-8 is setup correctly, see below. Might have to do something with the original files being located on a fat32 partition and being copied to an ext4 one. Although I can't really see how exactly...

locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=

python Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import sys sys.getfilesystemencoding() 'UTF-8'

henkelis commented 11 years ago

If the fat32 was Windows, then that could be it as Windows doesn't default to UTF8. You could try using convmv on the first file it encounters an error on and see if it then progresses to the next file.

On 02/03/13 21:59, Markus Binsteiner wrote:

Hi,

I think UTF-8 is setup correctly, see below. Might have to do something with the original files being located on a fat32 partition and being copied to an ext4 one. Although I can't really see how exactly...

locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=

python Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.

        import sys
        sys.getfilesystemencoding()
        'UTF-8'

— Reply to this email directly or view it on GitHub https://github.com/henkelis/sonospy/issues/70#issuecomment-14336864.

makkus commented 11 years ago

Hm, not even convmv could convert that folder name, so it's out of scope for sonospy to be able to import those files. Would it be possible to catch that exception ignoring the affected file/folder and continue scanning though? Or maybe have an "--ignore-errors" parameter?

henkelis commented 11 years ago

Yes I can look at that.

On 02/03/13 22:59, Markus Binsteiner wrote:

Hm, not even convmv could convert that folder name, so it's out of scope for sonospy to be able to import those files. Would it be possible to catch that exception ignoring the affected file/folder and continue scanning though? Or maybe have an "--ignore-errors" parameter?

— Reply to this email directly or view it on GitHub https://github.com/henkelis/sonospy/issues/70#issuecomment-14337803.