scraperwiki / spreadsheet-download-tool

A ScraperWiki plugin for downloading data from a box as a CSV or Excel spreadsheet
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

1 DAY: Text encoding being lost: € replaced with € in XLS and CSV #42

Closed zarino closed 10 years ago

zarino commented 10 years ago

I can confirm, the € signs are being converted into multiple characters, even when the CSV is opened in a normal text editor.

Original bug report from https://github.com/scraperwiki/table-xtract-tool/issues/15:

Seems to occur in both XLS and CSV formats.

Example from Table 12.

Export (Excel) image

Source: image

zarino commented 10 years ago

This happens for content in both normal SQL tables, and in grids.

zarino commented 10 years ago

Commit 4f7885ed73feb282edb422d19164183dc1368841 includes a change that forces Python Requests to decode the raw grid HTML as utf-8, rather than guessing an encoding, which seems to fix this for downloading grids as XLS and CSV files.

Downloading SQL tables as XLS and CSV files seems to be working, magically, on its own. :-/

drj11 commented 10 years ago

A note mostly for my benefit:

I was skeptical enough to wonder if our server was misleading requests by sending an encoding in the response header. But happily our server is sensible enough to not say anything about the encoding of the file:

Access-Control-Allow-Origin:*
Cache-Control:must-revalidate
Cache-Control:max-age=0
Connection:keep-alive
Content-Encoding:gzip
Content-Type:text/html
Date:Mon, 11 Nov 2013 12:42:07 GMT
Expires:Mon, 11 Nov 2013 12:42:07 GMT
Last-Modified:Wed, 06 Nov 2013 12:50:12 GMT
Pragma:no-cache
Server:nginx/1.2.6 (Ubuntu)
Transfer-Encoding:chunked