rsheets / rexcel

Extracts spreadsheet data from Excel workbooks and puts into linen format
50 stars 5 forks source link

Reveal encoding #11

Open jennybc opened 8 years ago

jennybc commented 8 years ago

This Twitter conversation reminded me of past pain I've had importing from Excel with unknown encoding. Maybe we could offer a little function that just exposes encoding info, even if someone goes on to import with a more conventional package. Reinforces the idea that one productive role for rexcel is for Excel diagnostics and troubleshooting xlsx import.

I note that when I migrated Gapminder data extraction from gdata to readxl, I was able to drop the explicit encoding specification. So some packages, presumably readxl among them, do figure this out for themselves, quietly.

richfitz commented 8 years ago

Good idea. I guess the enron corpus will be pretty basic, but getting hold of some more complex test cases would be good.

jennybc commented 8 years ago

readxl and googlesheets issues seem to have a steady trickle of spreadsheets from Russia 😬. Gapminder also has some non UTF-8 sheets. Maybe we should systematically download those and make the "Gapminder corpus"?

http://www.gapminder.org/data/