Closed patperu closed 3 years ago
Hi, sorry you've had this problem, and thank you for providing a detailed report. I don't have access to a Windows machine, but I will see what I can do.
Hi, thanks for looking into it! Let me know, if you need more information or if I should test something.
This might be related to something in a readxl issue https://github.com/tidyverse/readxl/issues/125, which found that the problem was to do with R itself on Windows, rather than anything in the package. It might be worth trying some of the suggestions in that issue.
Unfortunately, I don't think it's a problem that can be fixed in tidyxl, because tidyxl uses the same code as readxl to parse strings, and readxl is believed to be correct. Sorry I can't be more help.
Hi!
Thanks for finding an explanation! I didn't realise that readxl
and tidyxl
are using the same library. I'll check the readxl issue.
Right now I found a simple solution with Encoding()
. Calling Encoding
on the character
and formula
column fix the issue.
Encoding(s2$character) <- "UTF-8"
Encoding(s2$formula) <- "UTF-8"
Now it works:
library(tidyxl)
s1 <- tidyxl::xlsx_cells("H:/test/german_umlaute_utf8.xlsx", sheets = 'Sheet1')
s2 <- tidyxl::xlsx_cells("H:/test/german_umlaute_utf8.xlsx", sheets = 'Sheet2_with_cell_ref_to_Sheet1')
s1[, c("sheet", "address", "character")]
#> sheet address character
#> 1 Sheet1 B3 A
#> 2 Sheet1 B4 Einführung oder Änderung
#> 3 Sheet1 B5 Kapazität
#> 4 Sheet1 B6 Bauüberhang
#> 5 Sheet1 B7 (Zeilenumbrüche mittels Alt + Enter)
s2[, c("sheet", "address", "character", "formula")]
#> sheet address character
#> 1 Sheet2_with_cell_ref_to_Sheet1 B4 Einführung oder Änderung
#> 2 Sheet2_with_cell_ref_to_Sheet1 B5 Kapazität
#> 3 Sheet2_with_cell_ref_to_Sheet1 B6 Bauüberhang
#> 4 Sheet2_with_cell_ref_to_Sheet1 B7 (Zeilenumbrüche mittels Alt + Enter)
#> 5 Sheet2_with_cell_ref_to_Sheet1 B9 Zuzüge
#> formula
#> 1 Sheet1!B4
#> 2 Sheet1!B5
#> 3 Sheet1!B6
#> 4 Sheet1!B7
#> 5 IF(Sheet1!B3="A","Zuzüge","d2")
Encoding(s2$character) <- "UTF-8"
Encoding(s2$formula) <- "UTF-8"
# Encoding fixed
s2[, c("sheet", "address", "character", "formula")]
#> sheet address character
#> 1 Sheet2_with_cell_ref_to_Sheet1 B4 Einführung oder Änderung
#> 2 Sheet2_with_cell_ref_to_Sheet1 B5 Kapazität
#> 3 Sheet2_with_cell_ref_to_Sheet1 B6 Bauüberhang
#> 4 Sheet2_with_cell_ref_to_Sheet1 B7 (Zeilenumbrüche mittels Alt + Enter)
#> 5 Sheet2_with_cell_ref_to_Sheet1 B9 Zuzüge
#> formula
#> 1 Sheet1!B4
#> 2 Sheet1!B5
#> 3 Sheet1!B6
#> 4 Sheet1!B7
#> 5 IF(Sheet1!B3="A","Zuzüge","d2")
Created on 2020-12-09 by the reprex package (v0.3.0)
I think this is now fixed. If not, please reopen the issue.
Hi @nacnudus
thanks a lot for working on this issue!
It is fixed for the character
column, but unfortunately still exists with the formula
column (row 10):
You can see the difference between Windows
and Linux
.
PS: Sorry, I can't reopen the issue.
library(tidyxl)
packageVersion('tidyxl')
#> [1] '1.0.7.9000'
Sys.info()['sysname']
#> sysname
#> "Windows"
x <- xlsx_cells("german_umlaute_utf8.xlsx")
x[, c("sheet", "character", "formula")]
#> sheet character formula
#> 1 Sheet1 A <NA>
#> 2 Sheet1 Einführung oder Änderung <NA>
#> 3 Sheet1 Kapazität <NA>
#> 4 Sheet1 Bauüberhang <NA>
#> 5 Sheet1 (Zeilenumbrüche mittels Alt + Enter) <NA>
#> 6 Sheet2_with_cell_ref_to_Sheet1 Einführung oder Änderung Sheet1!B4
#> 7 Sheet2_with_cell_ref_to_Sheet1 Kapazität Sheet1!B5
#> 8 Sheet2_with_cell_ref_to_Sheet1 Bauüberhang Sheet1!B6
#> 9 Sheet2_with_cell_ref_to_Sheet1 (Zeilenumbrüche mittels Alt + Enter) Sheet1!B7
#> 10 Sheet2_with_cell_ref_to_Sheet1 Zuzüge IF(Sheet1!B3="A","Zuzüge","d2")
testthat::expect_equal(Encoding(x$character[-1]), rep("UTF-8", 9))
testthat::expect_equal(Encoding(x$formula[-1]), rep("UTF-8", 9))
#> Error: Encoding(x$formula[-1]) not equal to rep("UTF-8", 9).
#> 9/9 mismatches
#> x[1]: "unknown"
#> y[1]: "UTF-8"
#>
#> x[2]: "unknown"
#> y[2]: "UTF-8"
#>
#> x[3]: "unknown"
#> y[3]: "UTF-8"
#>
#> x[4]: "unknown"
#> y[4]: "UTF-8"
#>
#> x[5]: "unknown"
#> y[5]: "UTF-8"
Created on 2021-01-05 by the reprex package (v0.3.0)
library(tidyxl)
library(testthat)
packageVersion('tidyxl')
#> [1] '1.0.7.9000'
Sys.info()['sysname']
#> sysname
#> "Linux"
x <- xlsx_cells("german_umlaute_utf8.xlsx")
x[, c("sheet", "character", "formula")]
#> # A tibble: 10 x 3
#> sheet character formula
#> <chr> <chr> <chr>
#> 1 Sheet1 A <NA>
#> 2 Sheet1 Einführung oder Änderung <NA>
#> 3 Sheet1 Kapazität <NA>
#> 4 Sheet1 Bauüberhang <NA>
#> 5 Sheet1 (Zeilenumbrüche mittels Alt + Enter) <NA>
#> 6 Sheet2_with_cell_ref_to_Sheet1 Einführung oder Änderung "Sheet1!B4"
#> 7 Sheet2_with_cell_ref_to_Sheet1 Kapazität "Sheet1!B5"
#> 8 Sheet2_with_cell_ref_to_Sheet1 Bauüberhang "Sheet1!B6"
#> 9 Sheet2_with_cell_ref_to_Sheet1 (Zeilenumbrüche mittels Alt + Enter) "Sheet1!B7"
#> 10 Sheet2_with_cell_ref_to_Sheet1 Zuzüge "IF(Sheet1!B3=\"A\",\"Zuzüge\",\"d2\")"
testthat::expect_equal(Encoding(x$character[-1]), rep("UTF-8", 9))
testthat::expect_equal(Encoding(x$formula[-1]), rep("UTF-8", 9))
#> Error: Encoding(x$formula[-1]) not equal to rep("UTF-8", 9).
#> 9/9 mismatches
#> x[1]: "unknown"
#> y[1]: "UTF-8"
#>
#> x[2]: "unknown"
#> y[2]: "UTF-8"
#>
#> x[3]: "unknown"
#> y[3]: "UTF-8"
#>
#> x[4]: "unknown"
#> y[4]: "UTF-8"
#>
#> x[5]: "unknown"
#> y[5]: "UTF-8"
Created on 2021-01-05 by the reprex package (v0.3.0.9001)
Sorry about that. With the file you provided, the comment
column still won't have UTF-8 encoding, because there aren't any comments. I've added some to the file for the sake of a test.
Hi @nacnudus,
thank you! Yes, I never touched the comment
column - only character
and formula
.
I created a new testfile utf8-cities.xlsx
to test more languages and to simplify the structure.
AFAICS, at least the four columns sheet
, character
, comment
and formula
need the UTF-8 encoding.
For character
and comment
this is the case, but unfortunately not for the sheet
and formula
column.
Best Patrick
library(tidyxl)
# Version
# tidyxl* 1.0.7.9000 2021-01-06 [1] Github (nacnudus/tidyxl@8a1ceac)
Sys.info()['sysname']
#> sysname
#> "Windows"
x <- xlsx_cells("utf8-cities.xlsx")
x[, c("sheet", "address", "character", "comment", "formula")]
#> sheet address character comment formula
#> 1 Städte A1 Nærbø Norwegian, Nærbø "Nærbø"
#> 2 Städte A2 Köln German, Köln "Köln"
#> 3 Städte A3 Chéu French, Chéu "Chéu"
#> 4 Städte A4 España Spain, España "España"
#> 5 Städte A5 Klitmøller Danish, Klitmøller "Klitmøller"
testthat::expect_equal(Encoding(x$sheet), rep("UTF-8", 5))
#> Error: Encoding(x$sheet) not equal to rep("UTF-8", 5).
#> 5/5 mismatches
#> x[1]: "unknown"
#> y[1]: "UTF-8"
#>
#> x[2]: "unknown"
#> y[2]: "UTF-8"
#>
#> x[3]: "unknown"
#> y[3]: "UTF-8"
#>
#> x[4]: "unknown"
#> y[4]: "UTF-8"
#>
#> x[5]: "unknown"
#> y[5]: "UTF-8"
testthat::expect_equal(Encoding(x$character), rep("UTF-8", 5))
testthat::expect_equal(Encoding(x$formula), rep("UTF-8", 5))
#> Error: Encoding(x$formula) not equal to rep("UTF-8", 5).
#> 5/5 mismatches
#> x[1]: "unknown"
#> y[1]: "UTF-8"
#>
#> x[2]: "unknown"
#> y[2]: "UTF-8"
#>
#> x[3]: "unknown"
#> y[3]: "UTF-8"
#>
#> x[4]: "unknown"
#> y[4]: "UTF-8"
#>
#> x[5]: "unknown"
#> y[5]: "UTF-8"
testthat::expect_equal(Encoding(x$comment), rep("UTF-8", 5))
Created on 2021-01-10 by the reprex package (v0.3.0)
Okay, this time .... this time I think it's finally fixed. Thank you for taking the trouble to write a minimal test file.
Hi,
I'm having a problem with the encoding (german umlaute) on Windows 10 when using a cell reference. The test file contains two sheets, where "Sheet2" has references to "Sheet1". On Windows the conversion works for "Sheet1" but fails for "Sheet2". This problem does not occur when using Ubuntu.
Thanks for any tip and for this great package!! Patrick
On Windows it fails
On Ubuntu it works
Created on 2020-11-24 by the reprex package (v0.3.0)
Windows and Excel version
Test file 'german_umlaute_utf8.xlsx'
german_umlaute_utf8.xlsx