Closed scientistanton closed 6 years ago
Can you give us an example of a data frame that you imported with a stray comma, so that we can reproduce the problem?
I've found it really hard to reproduce - so I've trimmed the problematic csv and zipped it up (attached). This produces the error:
library(tidyverse)
library(writexl)
df <- read_csv("problem.csv")
list <- df%>% split(.$SHEET)
write_xlsx(list, "problem.xlsx")
I think this is a very niche issue. I'm not sure how this csv was generated.
I should have mentioned - you don't see the error until you open the resulting xlsx file in Excel.
Yes I see it. The problem actually already appears in read_csv
. Line 15 gets read incorrectly:
> read_csv("~/Downloads/problem.csv")[15,]
Parsed with column specification:
cols(
SHEET = col_integer(),
NAME = col_character()
)
# A tibble: 1 x 2
SHEET NAME
<int> <chr>
1 1 "Ward\x92s Prolific"
The word Wardís
has not been correctly read from the csv and the string is corrupt. I'm not sure how excel reads this csv file actually.
So the problem is that your csv file is encoded in Windows-1252 but R assumes UTF-8. Try this:
library(tidyverse)
library(writexl)
df <- read_csv("~/Downloads/problem.csv")
df$NAME <- iconv(df$NAME, "Windows-1252", "UTF-8")
write_xlsx(df, "works.xlsx")
Note that wikipedia explains why we can't use latin1:
It is very common (on the Internet) to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems
Hello, Not sure if this is a real issue, or how best to report it. A stray comma in an imported CSV caused problems when exported using
write_xlsx
. When I opened the xlsx file it ran a repair job; here is the log: The repair scrubbed all other records on the problematic sheets in the workbook.I simply removed the apostrophe from the imported data, so not a big problem, but I thought I should report it. Thanks, Anton