liamks / libpytunes

Python Itunes Library parser
https://github.com/liamks/pyitunes
MIT License
220 stars 88 forks source link

pyItunes fails to load libraries with non-ASCII URL paths in python 2.7 #19

Closed rfilmyer closed 9 years ago

rfilmyer commented 9 years ago

While I'm working on fixing my first issue, I ran into a problem with the most recent build of the library.

Loading my library, I ran into this error:

Python 2.7.10 (default, May 25 2015, 13:06:17) 
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.56)] on darwin
>>> import pyItunes
>>> l = pyItunes.Library("/Users/roger/projects/Fun with Data/itunes/iTunes Music Library.xml")
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/Users/roger/projects/pyitunes/pyItunes/Library.py", line 36, in __init__
    self.getSongs()
  File "/Users/roger/projects/pyitunes/pyItunes/Library.py", line 77, in getSongs
    s.location = text_type(urlparse.unquote(urlparse.urlparse(attributes.get('Location')).path[1:]))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 40: ordinal not in range(128)

0xcc corresponds to Ì, a "Latin capital letter I with grave". What's weird about this error is that this character doesn't appear at all in my library:

Rogers-iMac:pyitunes roger$ cat "/Users/roger/projects/Fun with Data/itunes/iTunes Music Library.xml" | grep -c Ì
0

The relevant line of code is here:

if ( self.musicPathXML is None or self.musicPathSystem is None ):
    # s.location = text_type(urlparse.unquote(urlparse.urlparse(attributes.get('Location')).path[1:]),"utf8")
    s.location = text_type(urlparse.unquote(urlparse.urlparse(attributes.get('Location')).path[1:]))
else:
    # s.location = text_type(urlparse.unquote(urlparse.urlparse(attributes.get('Location')).path[1:]).replace(self.musicPathXML,self.musicPathSystem),"utf8")
    s.location = text_type(urlparse.unquote(urlparse.urlparse(attributes.get('Location')).path[1:]).replace(self.musicPathXML,self.musicPathSystem))

Commenting out the first s.location line allows me to load the library.

rfilmyer commented 9 years ago

I notice that @codyzu changed this line when adding in some 2/3 compatibility. Will the original line (the one commented out up top) not work with python 3?

rfilmyer commented 9 years ago

Looking at this again. 0xCC is just ASCII, but 0xCC 0x81 refers to an acute accent - ´.

The Library initialization failed on this URL: file://localhost/Users/roger/Music/Albums/Boubacar%20Traore%CC%81%20-%20Mali%20Denhou/11%20Mali%20Tchebaou.mp3 (This should look like this when decoded properly: file://localhost/Users/roger/Music/Albums/Boubacar Traoré - Mali Denhou/11 Mali Tchebaou.mp3)

So this means that I need to find how to get Python 2's urllib.unquote() to work with unicode strings.

rfilmyer commented 9 years ago

Annoyingly, Python 2's urllib.unquote() encodes the above link as '/Users/roger/Music/Albums/Boubacar Traore\xcc\x81 - Mali Denhou/11 Mali Tchebaou.mp3', which then needs to be decoded into proper unicode format - u'/Users/roger/Music/Albums/Boubacar Traore\u0301 - Mali Denhou/11 Mali Tchebaou.mp3', but six.u() fails to do that. I'm having trouble finding an easy solution that would work with Python 2 and 3 combined.

rfilmyer commented 9 years ago

text_type does not work for decoding these strings, because it goes byte by byte (´ is 2 bytes long, so you get the UnicodeDecodeError).

rfilmyer commented 9 years ago

(Ignore all the commits except for 7991560; it took a few tries to rebase my fork's branch to liamks's HEAD so my two pull requests could be created separately)

rfilmyer commented 9 years ago

This should be fixed now in pull request #21.

bafonso commented 6 years ago

Just got this using 2.7, conda environment w/ tensorflow. My library definitely has a lot of accents

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-2-df9cbca52326> in <module>()
     19     if song and song.rating:
     20         if song.rating > 80:
---> 21             print("{n}, {r}".format(n=song.name, r=song.rating))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 5: ordinal not in range(128)

I'm simply running the code for using pickle.