jasonmc / forked-daapd

A re-write of the firefly media server (mt-daapd). It's released under GPLv2+. Please note that this git repository is a mirror of the official one at git://git.debian.org/~jblache/forked-daapd.git
http://blog.technologeek.org/2009/06/12/217
GNU General Public License v2.0
328 stars 45 forks source link

character decoding problem in iTunes Library.xml ("combining diaeresis") #81

Open ctr49 opened 12 years ago

ctr49 commented 12 years ago

I've got some serious problems with non-ascii characters in filenames contained in iTunesLibrary.xml

Take the following example:

        <key>Track ID</key><integer>123</integer>
        <key>Name</key><string>Die Klügsten Männer Der Welt</string>
        <key>Artist</key><string>Die Ärzte</string>
        <key>Album</key><string>Geräusch - CD2</string>
        <key>Location</key><string>file://localhost/Volumes/gemeinsam/Musik/iTunes/iTunes%20Music/Die%20A%CC%88rzte/Gera%CC%88usch%20-%20CD2/07%20Die%20Klu%CC%88gsten%20Ma%CC%88nner%20Der%20Welt.m4a</string>

Special characters like ä,ö,ü are encoded using "combining diaeresis" (%CC%88) and forked-daapd doesn't seem to honor this, so I don't have those in my playlist but just when it picks up the file in the filesystem.

Whenever %CC%88 is found, the previous character should be converted a -> ä o -> ö u -> ü A -> Ä O -> Ö U -> Ü

and "%C3%9F" shall become "ß" (without any attention to the previous character).

Thanks!

elwertk commented 12 years ago

The evhttp_decode_uri function in use here - provided by libevent - or its GCD pendant http_decode_uri are fairly simple and yes I don't think they supports CC88 url decoding.

However never had issues with files generated by itunes as they always used a different form of unicode normalisation in the encoded url? And those work IIRC.

Locationfile://localhost/C:/xxxxxxxxxxxxxxxx/%C3%84pfel%20und%20Birnen/%C3%84pfel%20und%20Birnen.m4v (Äpfel und Birnen.m4v) Anything that messes with the encoding of your itunes.xml? Does the source look the same?
ctr49 commented 12 years ago

I was looking at the raw xml as generated by iTunes. This may be related to either iTunes on MacOS or storing the library on an AFP share... But regardless of the reasons I guess it wouldnt hurt to decode those characters?!