Open jennybc opened 8 years ago
Good idea. I guess the enron corpus will be pretty basic, but getting hold of some more complex test cases would be good.
readxl and googlesheets issues seem to have a steady trickle of spreadsheets from Russia 😬. Gapminder also has some non UTF-8 sheets. Maybe we should systematically download those and make the "Gapminder corpus"?
This Twitter conversation reminded me of past pain I've had importing from Excel with unknown encoding. Maybe we could offer a little function that just exposes encoding info, even if someone goes on to import with a more conventional package. Reinforces the idea that one productive role for
rexcel
is for Excel diagnostics and troubleshooting xlsx import.I note that when I migrated Gapminder data extraction from
gdata
toreadxl
, I was able to drop the explicit encoding specification. So some packages, presumablyreadxl
among them, do figure this out for themselves, quietly.