nacnudus / tidyxl

Read untidy Excel files in R https://nacnudus.github.io/tidyxl/
https://nacnudus.github.io/tidyxl/
Other
246 stars 20 forks source link

Hexcodes for colors are strange/incorrect #89

Open JohnHendrickx opened 1 year ago

JohnHendrickx commented 1 year ago

Thanks very much for your great work with tidyxl! I often have to deal with Excel files where key information is in the color coding and tidyxl finally solves that problem. I do have a question though about the hex strings for fill colors. I can identify cells based on coloring but the colors returned by xlsx_formats are incorrect (or I don't understand them).

I'm working with Excel files that are generated by an analysis apparatus. The Excel files use a grey background (#A6A6A6) for "unselected" and pink (#F2DCDB) for "assigned", where "unselected" values are to be ignored. However, xlsx_formats returns "FFFFFFFF" for "unselected" and "FFC0504D" for "assigned". I understand that the leading "FF" should be ignored and then the colors returned would be #FFFFFF (white) and #C0504D (reddish-brown).

Is there an explanation for these unexpected hex strings? As said, I can identify the "unselected" cells using the returned hex string but I'd like to understand why they're so different from what I expect.

nacnudus commented 1 year ago

Thank you for your kind remarks. Please could you attach an example spreadsheet with the two colours you mention?

JohnHendrickx commented 1 year ago

test.xlsx I used this Excel file for a test this morning, The values in row 1 are copy/pasted from the original Excel file, values in row 3 were created by me.

This is the R program I used. Strangely, I now get "FFED7D31" for pink.

# https://higgi13425.github.io/medical_r/posts/2021-01-13-extracting-highlighting-as-data-from-excel/

library(tidyxl)
library(readxl)
library(dplyr)
library(stringr)
library(tidyr)
library(here)

Excel_fn <- paste0(here(),"/test.xlsx")

# Extract formats as one large list
Group_Layout_formats <- xlsx_formats(Excel_fn)

# Extract the colours only from the list
Group_Layout_colours <- Group_Layout_formats$local$fill$patternFill$fgColor$rgb

These are the values I get now for Group_Layout_colours:

> Group_Layout_colours
[1] NA         "FF000000" "FFFFFFFF" "FFED7D31" "FFFFFFFF" "FFED7D31" "FF000000"
nacnudus commented 1 year ago

I think there are two things going on here.

The first is that it is possible (I think) to define the "office" colour theme elsewhere than in the Excel file itself. When you open the file in Excel, one of the definitions has to win. That might explain why FFC0504D became FFED7D31. In the file, the colour FFED7D31 is the "Office" theme colour called "accent 2".

The second is that there is a base RGB value, to which a tint (percentage) is applied. Tidyxl reports the RGB and the tint separately. I opened the file in Libreoffice (sadly I no longer have access to Excel), and it said the pink colour is FBE5D6.

I didn't implement tints in tidyxl because I never understood them, but it seems that it's straightforward to apply a tint to an RGB value and get an RGB value back. Note that, in the script below, I move the alpha channel from the beginning to the end, for compatibility with grDevices::col2rgb().

library(tidyxl)

filename <- "./test.xlsx"
f <- xlsx_formats(filename)
c <- xlsx_cells(filename)
i <- c[c$col==3, "local_format_id"]
rgb <- f$local$fill$patternFill$fgColor$rgb[i]
tint <- f$local$fill$patternFill$fgColor$tint[i]

rgb
# [1] "FFED7D31" "FFED7D31"

tint
# [1] 0.7999817 0.7999817

# https://rdrr.io/cran/MESS/src/R/colorfunctions.R
col.tint <- function(col, tint=.5) {
    if(missing(col))
        stop("a vector of colours is missing")
    if (tint<0 | tint>1)
        stop("shade must be between 0 and 1")
    mat <- t(col2rgb(col, alpha=TRUE)  +  c(rep(1-tint, 3), 0)*(255-col2rgb(col, alpha=TRUE)))
    rgb(mat, alpha=mat[,4], maxColorValue=255)
}

col.tint("#ED7D31FF", tint = 1 - 0.7999817)
#FBE4D5, which is very nearly FBE6D6 as in libreoffice

col.tint("#C0504DFF", tint = 1 - 0.7999817)
#F2DBDB, which is very nearly F2DCDB as expected

Does Excel show the untinted RGB value anywhere? If not, tidyxl ought to automatically apply the tint to the base RGB value and give a final RGB value. I'm afraid I don't have time to work on tidyxl any more, but I'd look at a pull request. It would have to be exact, rather than approximate as above.

JohnHendrickx commented 1 year ago

If I hover over a color in the "Fill Color" menu, I see "Orange, Accent 2, Lighter 80%" image

For grey, the popup text is "White, Background 1, Darker 25%"

So this does seem to be related to the "Theme Colors" and the tints applied. I'll experiment a bit with the col.tint and see if that clarifies things further. Thanks for your time and help!