tidyverse / readxl

Read excel files (.xls and .xlsx) into R 🖇
https://readxl.tidyverse.org
Other
725 stars 192 forks source link

option to repeat value in merged cells #355

Open jameshowison opened 7 years ago

jameshowison commented 7 years ago

Moving discussion from: https://github.com/tidyverse/readxl/pull/220#issuecomment-298394447

Currently (I think) merged cells are handled by placing the value in the left-top cell of the range and placing NA in all other cells. Makes sense in many situations.

Another useful option might be to repeat the value across the previously merged range. That would, for example, be helpful in tidying files with multiple headers, as detailed here: https://howisonlab.github.io/datawrangling/Handling_multi_indexes.html#a-tidyverse-solution

However, with your suggestions from the other thread, tidyr::fill works for this sensibly if one does a little transposing:

# read, fill, and collapse 4 rows of multiple headers from merged cells
headers <- read_excel(filename, col_names = FALSE, na="..", n_max = 4)
# fill only works down or up, so have to transpose headers
long_headers = data.frame(t(headers))
long_headers <- fill(long_headers,1:4) # all four columns
headers <- data.frame(t(long_headers)) # back to vertical.

Nonetheless, an option to unmerge and fill all cells might still be useful.

zyxdef commented 7 years ago

It would be quite handy to have some option like mergedcells = c("fill", "missing") with possibly some other sensible alternatives. I was going to suggest it as a feature if it weren't already suggested.

slyrus commented 4 years ago

+1. Merged cells are the bane of my existence.

jtr13 commented 4 years ago

How about fill(..., .direction = "right")?

https://github.com/tidyverse/tidyr/blob/master/R/fill.R

paulduf commented 2 months ago

+1. Can't believe this is not already implemented somehow.

EDIT: OK I just discovered the existence of tidyxl and unpivotr, so maybe this is the recommended way to go. Still, this adds some technical complexity to something that should be simple.