enhead appears not to work as expected with NNW

nacnudus / unpivotr

Unpivot complex and irregular data layouts in R

Other

185 stars 19 forks source link

row/col	1(A)	2(B)	3(C)
1	NA	MMSE 12-26	NA
2	NA	Placebo	Simvastatin
3	Measures	(n = 17)	(n = 20)
4	Serum	NA	NA
5	ALDL cholesterol	-5.9 ± 16.5	-51.9 ± 16.2“

row/col

1(A)

2(B)

3(C)

MMSE 12-26

Placebo

Simvastatin

Measures

(n = 17)

(n = 20)

Serum

ALDL cholesterol

-5.9 ± 16.5

-51.9 ± 16.2“

Minimal example

library(unpivotr) library(tidyverse) ## Create data my_cells <- tibble( row = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L), col = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), data_type = c("blank", "character", "blank", "blank", "character", "character", "character", "character", "character", "character", "blank", "blank", "character", "character", "character"), character = c(NA, "MMSE 12-26", NA, NA, "Placebo", "Simvastatin", "Measures", "(n = 17)", "(n = 20)", "Serum", NA, NA, "ALDL cholesterol", "-5.9 ± 16.5", "-51.9 ± 16.2“") ) ## Display data in excel table format rectify(my_cells) ## Select data from within table my_data_cells <- my_cells %>% filter(row>3, col>1) %>% select(row, col, character) rectify(my_cells) ## Select first row as a header from within tables my_header_cells <- my_cells %>% filter(row == 1) ## Use enhead to join the header to the table add_header <- enhead(my_data_cells, my_header_cells, direction = "NNW") ## Create simple looping function to produce desired behaviour NNW <- function(data_cells, header_cells) { if("character" %in% names(data_cells)) names(data_cells)[names(data_cells) == "character"] <- "value" d <- distinct(data_cells, col) %>% mutate(col_h = NA) h <- header_cells %>% rename(col_h = col) %>% select(col_h, character) %>% filter(!is.na(character)) for(col_data in d$col){ col_test <- col_data while(col_test > 0L){ if(col_test %in% h$col_h) { d$col_h[d$col == col_data] <- col_test break() } col_test <- col_test -1 } } data_cells %>% inner_join(d) %>% inner_join(h) %>% select(-col_h) } ## Run simple function add_header_manual <- NNW(my_data_cells, my_header_cells) ## compare outputs ### enhead version add_header ### crude version add_header_manual

Hi @dmcalli2,

Thank you for your kind words and the reproducible example. You might enjoy using the reprex package, which makes it easy to include the outputs as well as the code.

Your code was very nearly there -- I have added a couple of steps to rename columns so that you don't get character.x and character.y columns at the end, and one step filter out NA cells from the headers, which is why the "NNW" direction didn't appear to work.

I hope to get time to improve Spreadsheet Munging Strategies, and if you could open an issue for a specific section that would you help you, that would help me too :) In the mean time I am making some videos.

library(unpivotr)
library(tidyverse)

## Create data
my_cells <- tibble(
  row = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 4L, 4L, 5L, 5L, 5L),
  col = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L),
  data_type = c("blank", "character", "blank", "blank", "character", "character",
                "character", "character", "character", "character", "blank",
                "blank", "character", "character", "character"),
  character = c(NA, "MMSE 12-26", NA, NA, "Placebo", "Simvastatin", "Measures",
                "(n = 17)", "(n = 20)", "Serum", NA, NA, "ALDL cholesterol",
                "-5.9 ± 16.5", "-51.9 ± 16.2“")
)
## Display data in excel table format
rectify(my_cells)
#> # A tibble: 5 x 4
#>   `row/col` `1(A)`           `2(B)`      `3(C)`       
#>       <int> <chr>            <chr>       <chr>        
#> 1         1 <NA>             MMSE 12-26  <NA>         
#> 2         2 <NA>             Placebo     Simvastatin  
#> 3         3 Measures         (n = 17)    (n = 20)     
#> 4         4 Serum            <NA>        <NA>         
#> 5         5 ALDL cholesterol -5.9 ± 16.5 -51.9 ± 16.2“

## Select data from within table
my_data_cells <- my_cells %>%
  filter(row>3, col>1) %>%
  select(row, col, value = character) # rename the columns
my_data_cells
#> # A tibble: 4 x 3
#>     row   col value        
#>   <int> <int> <chr>        
#> 1     4     2 <NA>         
#> 2     4     3 <NA>         
#> 3     5     2 -5.9 ± 16.5  
#> 4     5     3 -51.9 ± 16.2“
rectify(my_data_cells, value)
#> # A tibble: 2 x 3
#>   `row/col` `2(B)`      `3(C)`       
#>       <int> <chr>       <chr>        
#> 1         4 <NA>        <NA>         
#> 2         5 -5.9 ± 16.5 -51.9 ± 16.2“

## Select first row as a header from within tables
my_header_cells <- my_cells %>%
  filter(row == 1, !is.na(character)) %>% # filter out NA cells
  select(row, col, header = character) # rename the columns
my_header_cells
#> # A tibble: 1 x 3
#>     row   col header    
#>   <int> <int> <chr>     
#> 1     1     2 MMSE 12-26

## Use enhead to join the header to the table
add_header <- enhead(my_data_cells, my_header_cells, direction = "NNW")
add_header
#> # A tibble: 4 x 4
#>     row   col value         header    
#>   <int> <int> <chr>         <chr>     
#> 1     4     2 <NA>          MMSE 12-26
#> 2     4     3 <NA>          MMSE 12-26
#> 3     5     2 -5.9 ± 16.5   MMSE 12-26
#> 4     5     3 -51.9 ± 16.2“ MMSE 12-26

Created on 2018-12-19 by the reprex package (v0.2.0.9000).

nacnudus / unpivotr

enhead appears not to work as expected with NNW #20

Minimal example