xgate1 / pylast

Automatically exported from code.google.com/p/pylast
Apache License 2.0
0 stars 0 forks source link

illegal xml characters not removed (with possible fix) #70

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
First, thank you for this great package!
Working with python 2.6, Ubuntu system, utf-8 default encoding

It appears that some characters are illegal in XML, even if they can be encoded 
in utf-8. Sometimes, Last.fm return such characters, for instance, the 
following command should crash due to a minidom parse error:

r = network.search_for_track('Blind Willie Johnson', "It's nobody's fault but 
mine")
r.get_total_result_count()

Following this blog, a regex can solve the issue:
http://maxharp3r.wordpress.com/2008/05/15/pythons-minidom-xml-and-illegal-unicod
e-characters/
it is a little hacky, the character is replaced by '?' instead of being solved, 
but still better than ignoring the whole response and returning an error.

Attached is a working fix, I added one line at the end of _download_response 
and the regex at the end of file.
Should probably be refactored (regex should be compile, import re at the top, 
etc), but hope it helps!

Original issue reported on code.google.com by berti...@gmail.com on 9 Jul 2011 at 8:30

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks, fixed in my fork:
https://github.com/hugovk/pylast/issues/71

Original comment by hugovk@gmail.com on 2 Mar 2014 at 8:54