Closed GoogleCodeExporter closed 9 years ago
This issue was updated by revision 1ea314350be8.
Starting to work on this
Original comment by andresgattinoni
on 29 Aug 2011 at 4:16
This issue was closed by revision 265ed60e3450.
Original comment by andresgattinoni
on 29 Aug 2011 at 4:16
Ramiro, can you test it before I merge the fix to the integrate branch?
Original comment by andresgattinoni
on 29 Aug 2011 at 4:18
Oki doki, I'll test it tonight and let you know.
Original comment by algoz...@gmail.com
on 29 Aug 2011 at 5:00
Andrés,
Testing this fix I've found that Alibris isn't working with non-ascii chars. I've tryied for example with "Televisión" and I get a "can't decode" error.
OTOH, the Google one seems to work OK.
Original comment by algoz...@gmail.com
on 5 Sep 2011 at 3:10
Ok, now the problem is in another place. It's on line 71 of the Alibris plugin:
http://code.google.com/p/aranduka/source/browse/src/plugins/guess_alibris/__init
__.py?name=issue61#71
When it tries to decode the title that comes from Alibris, in some cases it
raises that exception. If I remove the .decode('utf-8') or do
unicode(book.get('title', 'No Title')), it doesn't fail but some characters are
not displayed properly.
These encoding issues are always a pain... I'm not sure how it would be the
best way to fix this.
Original comment by andresgattinoni
on 7 Sep 2011 at 3:51
Reading the API docs I found this page to try querys:
http://developer.alibris.com/iodocs
I searched for "Televisión" and this is the response:
<?xml version="1.0" encoding="iso-8859-1" ?>
<ALIBRIS xmlns:dt="urn:schemas-microsoft-com:datatypes">
[...]a lot of xml[...]
Sooo.. it appears that the response is encoded in iso-8859-1 instead of UTF ;-)
Original comment by algoz...@gmail.com
on 7 Sep 2011 at 4:13
I tried doing .decode('iso-8859-1'), but it's the same, I get the error:
"'ascii' codec can't encode character u'\xe8' in position 2: ordinal not in
range(128)"
Original comment by andresgattinoni
on 7 Sep 2011 at 5:06
If you remove the .decode('[...]') and do a print of type(title) after that you
can see that the title is already a unicode string, so there's no need to
decode it. Why some characters are not displayed correctly is a mistery; I
think is an Alibris problem. I suggest leaving it without the .decode() method.
Original comment by algoz...@gmail.com
on 7 Sep 2011 at 5:42
Ok, I agree.
Original comment by andresgattinoni
on 8 Sep 2011 at 1:06
This issue was updated by revision 085323b97e8e.
Please review this, so that I can merge the fix to the integrate branch
Original comment by andresgattinoni
on 8 Sep 2011 at 1:09
Original issue reported on code.google.com by
algoz...@gmail.com
on 20 Apr 2011 at 4:58