Open makkus opened 11 years ago
It sounds like Python doesn't know to expect utf8 from your filesystem.
What locale do you have set:
[mark@mark sonospy]$ locale
LANG=en_GB.utf8
LC_CTYPE="en_GB.utf8"
LC_NUMERIC="en_GB.utf8"
LC_TIME="en_GB.utf8"
LC_COLLATE="en_GB.utf8"
LC_MONETARY="en_GB.utf8"
LC_MESSAGES="en_GB.utf8"
LC_PAPER="en_GB.utf8"
LC_NAME="en_GB.utf8"
LC_ADDRESS="en_GB.utf8"
LC_TELEPHONE="en_GB.utf8"
LC_MEASUREMENT="en_GB.utf8"
LC_IDENTIFICATION="en_GB.utf8"
LC_ALL=
and what do you get when you run the following in python:
[mark@mark sonospy]$ python
>>> import sys
>>> sys.getfilesystemencoding()
'UTF-8'
>>> quit()
Hi,
I think UTF-8 is setup correctly, see below. Might have to do something with the original files being located on a fat32 partition and being copied to an ext4 one. Although I can't really see how exactly...
locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
python Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import sys sys.getfilesystemencoding() 'UTF-8'
If the fat32 was Windows, then that could be it as Windows doesn't default to UTF8. You could try using convmv on the first file it encounters an error on and see if it then progresses to the next file.
On 02/03/13 21:59, Markus Binsteiner wrote:
Hi,
I think UTF-8 is setup correctly, see below. Might have to do something with the original files being located on a fat32 partition and being copied to an ext4 one. Although I can't really see how exactly...
locale LANG=en_US.UTF-8 LANGUAGE= LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
python Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import sys sys.getfilesystemencoding() 'UTF-8'
— Reply to this email directly or view it on GitHub https://github.com/henkelis/sonospy/issues/70#issuecomment-14336864.
Hm, not even convmv could convert that folder name, so it's out of scope for sonospy to be able to import those files. Would it be possible to catch that exception ignoring the affected file/folder and continue scanning though? Or maybe have an "--ignore-errors" parameter?
Yes I can look at that.
On 02/03/13 22:59, Markus Binsteiner wrote:
Hm, not even convmv could convert that folder name, so it's out of scope for sonospy to be able to import those files. Would it be possible to catch that exception ignoring the affected file/folder and continue scanning though? Or maybe have an "--ignore-errors" parameter?
— Reply to this email directly or view it on GitHub https://github.com/henkelis/sonospy/issues/70#issuecomment-14337803.
I'm getting an exception (below) whenever I try to scan folders or files with special characters in the name. Any ideas?
I'm using Ubuntu 12.04 & latest unstable branch...
markus@barrelhaven:/opt/sonospy/sonospy$ cat /opt/sonospy/sonospy/errors/ErrorDump-20130302-215728.txt <type 'exceptions.UnicodeDecodeError'> Python 2.7.3: /usr/bin/python Sat Mar 2 21:57:28 2013
A problem occurred in a Python script. Here is the sequence of function calls leading up to the error, in the order they occurred.
/opt/sonospy/sonospy/gettags.py in()
2527
2528 if name == "main":
2529 status = main()
2530 sys.exit(status)
2531
status undefined
main =
/opt/sonospy/sonospy/gettags.py in main(argv=None) 2522 for path in args: 2523 if path.endswith(os.sep): path = path[:-1] 2524 process_dir(path.decode(enc), options, database) 2525 filelog.close_log_files() 2526 return 0 global process_dir =
path = '/mnt/music/library/artists'
path.decode =
global enc = 'UTF-8'
options = <Values at 0xa09484c: {'verbose': False, 'databa...quiet': False, 'exclude': None, 'extract': None}>
database = '/opt/sonospy/sonospy/artists.db'
/opt/sonospy/sonospy/gettags.py in process_dir(scanpath=u'/mnt/music/library/artists', options=<Values at 0xa09484c: {'verbose': False, 'databa...quiet': False, 'exclude': None, 'extract': None}>, database='/opt/sonospy/sonospy/artists.db') 254 255 visitedpaths = [] 256 for filepath, dirs, files in os.walk(scanpath, followlinks=follow_symlinks): 257 258 filepath = os.path.abspath(os.path.realpath(filepath)) filepath undefined dirs undefined files undefined global os = <module 'os' from '/usr/lib/python2.7/os.pyc'> os.walk =
scanpath = u'/mnt/music/library/artists'
followlinks undefined
global follow_symlinks = False
/usr/lib/python2.7/os.py in walk(top=u'/mnt/music/library/artists', topdown=True, onerror=None, followlinks=False) 282 dirs, nondirs = [], [] 283 for name in names: 284 if isdir(join(top, name)): 285 dirs.append(name) 286 else: isdir =
join =
top = u'/mnt/music/library/artists'
name = 'A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble'
/usr/lib/python2.7/posixpath.py in join(a=u'/mnt/music/library/artists', *p=('A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble',)) 69 path += b 70 else: 71 path += '/' + b 72 return path 73 path = u'/mnt/music/library/artists' b = 'A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble' <type 'exceptions.UnicodeDecodeError'>: 'ascii' codec can't decode byte 0xe1 in position 38: ordinal not in range(128) class = <type 'exceptions.UnicodeDecodeError'> delattr = <method-wrapper 'delattr' of exceptions.UnicodeDecodeError object> dict = {} doc = 'Unicode decoding error.' format =
getattribute = <method-wrapper 'getattribute' of exceptions.UnicodeDecodeError object>
getitem = <method-wrapper 'getitem' of exceptions.UnicodeDecodeError object>
getslice = <method-wrapper 'getslice' of exceptions.UnicodeDecodeError object>
hash = <method-wrapper 'hash' of exceptions.UnicodeDecodeError object>
init = <method-wrapper 'init' of exceptions.UnicodeDecodeError object>
new =
reduce =
reduce_ex =
repr = <method-wrapper 'repr' of exceptions.UnicodeDecodeError object>
setattr = <method-wrapper 'setattr' of exceptions.UnicodeDecodeError object>
setstate =
sizeof =
str = <method-wrapper 'str' of exceptions.UnicodeDecodeError object>
subclasshook =
unicode =
args = ('ascii', '/A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble', 38, 39, 'ordinal not in range(128)')
encoding = 'ascii'
end = 39
message = ''
object = '/A Hawk and a Hacksaw and The Hun Hang\xe1r Ensemble'
reason = 'ordinal not in range(128)'
start = 38
The above is a description of an error in a Python program. Here is the original traceback:
Traceback (most recent call last): File "./gettags.py", line 2529, in
status = main()
File "./gettags.py", line 2524, in main
process_dir(path.decode(enc), options, database)
File "./gettags.py", line 256, in process_dir
for filepath, dirs, files in os.walk(scanpath, followlinks=follow_symlinks):
File "/usr/lib/python2.7/os.py", line 284, in walk
if isdir(join(top, name)):
File "/usr/lib/python2.7/posixpath.py", line 71, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 38: ordinal not in range(128)