tidyverse / readxl

Read excel files (.xls and .xlsx) into R 🖇
https://readxl.tidyverse.org
Other
724 stars 192 forks source link

Read and write methods that can handle horizontal (row-based) instead of vertical (column-based) data orientation #161

Closed jankowtf closed 7 years ago

jankowtf commented 8 years ago

Would it be possible to introduce read and write methods that can handle Excel data stored in a horizontal (row-based) instead of a vertical (column-based) orientation (i.e. variables are represented by rows instead of columns) while preserving the data type/class information of the respective variables (i.e. not using coercion to character as a "workaround")?

Here's a little example file for both orientations: https://github.com/rappster/stackoverflow/blob/master/excel/row-and-column-based-data.xlsx

I've elaborated a bit on the motivation for this request in a stackoverflow post

While a convincing reason for working against the column-based data orientation standard still escapes me, it seems like a number of platforms/tools in the area of time series analysis/business forecasting are using such a row-based/wide data orientation and I'm going mad trying to deal with it.

A consideration of this would be greatly appreciated :-)

jankowtf commented 8 years ago

Someone at SO pointed me to type.convert and this seems to do the trick!

library(xlsx)
read.transposed.xlsx <- function(file, sheet = 1) {
  df <- xlsx::read.xlsx(file, sheetIndex = sheet , header = FALSE)
  dft <- as.data.frame(t(df[-1]), stringsAsFactors = FALSE)
  names(dft) <- df[, 1]
  dft <- as.list(dft)
  dft <- as.data.frame(lapply(dft, type.convert))
  return(dft)
}

> (x <- read.transposed.xlsx("row-and-column-based-data.xlsx", sheet = 2))
    variable var_1 var_2 var_3
1 2016-01-01     1     a  TRUE
2 2016-01-02     2     b FALSE
3 2016-01-03     3     c  TRUE
> x$var_1
[1] 1 2 3
> x$var_3
[1]  TRUE FALSE  TRUE
jennybc commented 7 years ago

I think this is best handled by reading, transposing, and then (re-)doing type conversion. I don't see readxl gaining functionality to do this directly.

FYI readr::type_convert() will do type conversion in the readr / tidyverse style, if you want to try that out in your workflow. For example, stringsAsFactors = FALSE would be default behaviour.

jankowtf commented 7 years ago

@jennybc thanks for the pointer to readr::type_convert()!