Describe what is missing huxtable does not provide any function to automatically merge repeated cells by column and/or row.

Describe your proposed solution See code below for new proposed function merge_repeated_cells

#' Merge a range of cells with identical content
#'
#' @param ht A huxtable.
#' @param row A row specifier. See \code{\link{rowspecs}} for details. 
#' @param col A column specifier.
#' @param merge_first Either 'row' or 'column' (default). Define how cells are merged first
#'
#' @details
#' `merge_repeated_cells` automatically detects and merge rectangular-shape ranges of cells with identical
#' content. For the purpose of the scan for repeated values, the content of the huxtable is coerced to 
#' character but the original class are maintained in the function output. Nevertheless, you must ensure that
#' any data transformation (ie, rounding) is performed prior to calling `merge_repeated_cells`.
#' 
#' By default, cells are first merged by column then by row. If the opposite is needed, set the 
#' \code{merge_first} argument to 'row'.
#' 
#' @return The `ht` object.
#'
#' @export
#' @examples
#' ht <- hux(
#' var1 = c(1, 2, 3, 3, 5, 6),
#' var2 = c('a', 'b', 'c', 'c', 'e', 'e'),
#' var3 = c('A', 'b', 'c', 'c', 'e', 'f'),
#' var4 = c('g', 'b', 'h', 'i', 'j', 'k'),
#' var5 = c('l', 'l', 'o', 'p', 'j', 'q'),
#' stringsAsFactors = FALSE
#' 
#' ht <- set_all_borders(ht, 1)
#' merge_repeated_cells(ht)
#' merge_repeated_cells(ht, c(2,5))
#' merge_repeated_cells(ht, merge_first = 'row')

merge_repeated_cells <- function(ht, rows, cols, merge_first = c('column', 'row'))  {

  if (missing(ht)){
    stop('ht argument is missing')
  }
  mod_ht <- as.matrix(ht)

  if (missing(rows)){
    rows <- 1:nrow(mod_ht)
  }
  if (missing(cols)){
    cols <- 1:ncol(mod_ht)
  }
  if (!is.numeric(rows)){
    rows <- 1:nrow(mod_ht)
  } else {
    rows <- sort( unique(rows) )
    rows[1] <- max( c(1, min(rows)) )
    rows[length(rows)] <- min( c(nrow(mod_ht), max(rows)) )
  }
  if (!is.numeric(cols)){
    cols <- 1:ncol(mod_ht)
  } else {
    cols <- sort(unique(cols))
    cols[1] <- max( c(1, min(cols)) )
    cols[length(cols)] <- min( c(ncol(mod_ht), max(cols)) )
  }
  merge_first <- match.arg(merge_first)

  all_same <- function(x) {
    xx <- as.vector(x)
    all(xx == xx[1])
  }

  # Transpose mod_ht if necessary
  if (merge_first == 'row'){
    mod_ht <- t(mod_ht)
    tmp <- rows
    rows <- cols
    cols <- tmp
    rm(tmp)
  }

  # Initiate mapping objects
  mapping_matrix <- matrix(
    0,
    nrow = nrow(mod_ht),
    ncol = ncol(mod_ht)
  )
  coords <- list()
  block <- 1

  for (irow in rows){

    for (icol in cols){

      if ( mapping_matrix[irow, icol] == 0 ){

        # Initiate tokens
        hstop <- vstop <- FALSE
        hshift <- vshift <- 1

        # Expansion of search rectangles
        while ( !hstop | !vstop ) {

          # Search algorithm
          if ( (icol + hshift) %in% cols & !hstop ){
            if ( all_same(mod_ht[irow:(irow+vshift-1), icol:(icol+hshift)]) ){
              hshift <- hshift + 1
            } else {
              hstop <- TRUE
            }
          } else {
            hstop <- TRUE
          }

          if ( (irow + vshift) %in% rows & !vstop ){
            if ( all_same(mod_ht[irow:(irow+vshift), icol:(icol+hshift-1)]) ){
              vshift <- vshift + 1
            } else {
              vstop <- TRUE
            }
          } else {
            vstop <- TRUE
          }
        }

        # Store information
        if ( hshift > 1 | vshift > 1 ){
          coords[[block]] <- list(
            rows = irow:(irow + vshift - 1),
            cols = icol:(icol + hshift - 1)
          )
          mapping_matrix[coords[[block]]$rows, coords[[block]]$cols] <- block
          block <- block + 1
        }

      }
    }
  }

  # Merge cells
  if ( length(coords) > 0 ){
    for ( iblock in 1:length(coords) ){
      if (merge_first == 'column'){
        ht <- merge_cells(ht, coords[[iblock]]$rows, coords[[iblock]]$cols) 
      } else {
        ht <- merge_cells(ht, coords[[iblock]]$cols, coords[[iblock]]$rows)
      }
    }
  }

  return(ht)

}

Target audience All users

Additional context

Sorry @hughjonesd, I don't know how to do pull request, so I posted my suggested code as a feature issue.

Hey, that's cool and high effort. Thank you.

Doing a pull request is going to make it much easier to test, review and (perhaps!) integrate this, and it's a useful skill, so I'll tell you how.

Fork my repository to your own github account. You can do that using the "Fork" button right on this page.
On your command line do "git clone " - you can copy and paste the path from your newly created repository page.
Jump into RStudio (there's an .Rproj file for the project).
Create a new branch and call it something appropriate, with the little purple boxes icon on the "Git" tab.
Make your changes.
Push to github.
Go to github and click "pull request".

Next up, I'll look at this code and give you my thoughts.

So my first question is, what's the use case? When did you need this function? Who else would need it?

The use case for merge_repeated_rows() is that often, a particular column reflects some grouping data that could be represented in a huxtable by a single piece of text. For example, look at the Species column in iris.

I can see something similar being true across a particular row, but it seems rarer. For most data, especially if it's in "tidy" format, different columns represent fundamentally different pieces of data. The exception is data in wide format where you might have e.g. "rank in week 1", "rank in week 2", etc. Though there again, it's not clear you'd often want to merge across those columns – it would be a very specific use case.

This is something more general again. I wonder how often people will have a "square" of data that they want to turn into a single cell.

My second comment is, the function is long and complex. I can see it being hard to maintain... and huxtable has a lot of hard-to-maintain code already, written by yours truly :-) ....

Overall then, I'm not convinced of this. But I do appreciate the effort. A simpler merge_repeated_cols (or columns? Hmm...) might have a better chance of getting in. Another project I'm thinking about is something to take a "grouping" column and turn it into a header row running across the top of the group. (I think the gt package has something similar.)

A few minor points:

use assertthat to check arguments on your way in.
rows and columns should accept arguments in huxtable style, see merge_repeated_rows for one way to do this.
Why would it matter whether you merge columns or rows first?
If rows or cols are outside the range of the huxtable, I'd throw an error rather than trying to guess the user's intention. get_rc_spec will do that for you, again, see merge_repeated_rows.

Hi David,

Thanks for the step-by-step instructions on github pull requests. I will look into it.

Regarding your questions:

use case: I must admit that the creation of this function is self-motivated. My company would like to switch some of our toolchain from php to R. Huxtable appears to be the table creation package that most matches our needs. As part of our activities, we create automated reports of parameter estimates for nonlinear mixed-effects models; this is done using highly "structured" tables with cells that could be merged by row, column or both. While the placement of some of these merged cells in the tables is known (eg, headers), the placement of others in the body of the table is highly variable. Technically, the merging of these areas could be done with merge_cells, but that could be a tedious process which the new function automates. My guess is that the usefulness of this new function would be limited outside of these highly engineered reporting table layouts. In particular, I doubt that it would be useful for showing the content of a dataset, like iris. At the minimum, the new function would provide an easy way to merge cells by column, which probably is a most frequent need.
code complexity: I admit that the detection of the 2D area of repeated values is not easy (took me a few scrapped tries to find an efficient way to do it). But the fact is that 99% of the code is independent of the huxtable machinery. Only the last block where the merge is actually happening depends on huxtable functions. Therefore, except if you decide to switch the structure of huxtables from pseudo data.frames to xml stuff like kableExtra, the algorithm should hold.
minor points:
- I will look into assertthat and huxtable style arguments
- merging by column or row first matters when the table contains something like the following (see A's): X1 X2 X3 A A B A C D
  
  merging by column first would return: X1 X2 X3 A B A C D
  
  merging by row first would return: X1 X2 X3 A A B C D

Cheers

Sebastien

On Sun, Jun 7, 2020 at 1:48 AM David Hugh-Jones notifications@github.com wrote:

So my first question is, what's the use case? When did you need this function? Who else would need it?

The use case for merge_repeated_rows() is that often, a particular column reflects some grouping data that could be represented in a huxtable by a single piece of text. For example, look at the Species column in iris.

I can see something similar being true across a particular row, but it seems rarer. For most data, especially if it's in "tidy" format, different columns represent fundamentally different pieces of data. The exception is data in wide format where you might have e.g. "rank in week 1", "rank in week 2", etc. Though there again, it's not clear you'd often want to merge across those columns – it would be a very specific use case.

This is something more general again. I wonder how often people will have a "square" of data that they want to turn into a single cell.

My second comment is, the function is long and complex. I can see it being hard to maintain... and huxtable has a lot of hard-to-maintain code already, written by yours truly :-) ....

Overall then, I'm not convinced of this. But I do appreciate the effort. A simpler merge_repeated_cols (or columns? Hmm...) might have a better chance of getting in. Another project I'm thinking about is something to take a "grouping" column and turn it into a header row running across the top of the group. (I think the gt package has something similar.)

A few minor points:

use assertthat to check arguments on your way in.

rows and columns should accept arguments in huxtable style, see merge_repeated_rows for one way to do this.

Why would it matter whether you merge columns or rows first?

If rows or cols are outside the range of the huxtable, I'd throw an error rather than trying to guess the user's intention. get_rc_spec will do that for you, again, see merge_repeated_rows.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughjonesd/huxtable/issues/156#issuecomment-640161543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETTNP6CK4O3C4DTGTDGT3DRVMS2LANCNFSM4NV7LPCA .

Hi David,

Correct me if I am wrong but I don't think that get_rc_specs errors out when coordinates are outside the dimension of the huxtable:

ht <- hux( a = 1:2, b = 3:4 ) rows <- 4:8 cols <- 3:5

Still returns 4:8

huxtable:::get_rc_spec(ht, rows, 1)

Still returns 3:5

huxtable:::get_rc_spec(ht, cols, 1)

Executed example:

ht <- hux(

a = 1:2,

b = 3:4

) rows <- 4:8 cols <- 3:5
Still returns 4:8

huxtable:::get_rc_spec(ht, rows, 1) [1] 4 5 6 7 8

Still returns 3:5

huxtable:::get_rc_spec(ht, cols, 1) [1] 3 4 5 sessionInfo() R version 3.6.3 (2020-02-29) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Linux Mint 19.3

Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1 LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] jsonlite_1.6.1 huxtable_4.7.1

loaded via a namespace (and not attached): [1] zip_2.0.4 Rcpp_1.0.4.6 pillar_1.4.3 compiler_3.6.3 base64enc_0.1-3 tools_3.6.3 digest_0.6.25 uuid_0.1-4 evaluate_0.14 lifecycle_0.2.0 [11] tibble_3.0.1 lattice_0.20-41 pkgconfig_2.0.3 rlang_0.4.6 rstudioapi_0.11 xfun_0.13 officer_0.3.11 dplyr_0.8.5 stringr_1.4.0 knitr_1.28 [21] xml2_1.3.2 generics_0.0.2 gdtools_0.2.2 vctrs_0.2.4 systemfonts_0.2.2 grid_3.6.3 tidyselect_1.0.0 glue_1.4.0 data.table_1.12.8 R6_2.4.1 [31] rmarkdown_2.1 purrr_0.3.4 magrittr_1.5 ellipsis_0.3.0 htmltools_0.4.0 assertthat_0.2.1 flextable_0.5.10 stringi_1.4.6 crayon_1.3.4 zoo_1.8-8

On Sun, Jun 7, 2020 at 1:48 AM David Hugh-Jones notifications@github.com wrote:

So my first question is, what's the use case? When did you need this function? Who else would need it?

The use case for merge_repeated_rows() is that often, a particular column reflects some grouping data that could be represented in a huxtable by a single piece of text. For example, look at the Species column in iris.

I can see something similar being true across a particular row, but it seems rarer. For most data, especially if it's in "tidy" format, different columns represent fundamentally different pieces of data. The exception is data in wide format where you might have e.g. "rank in week 1", "rank in week 2", etc. Though there again, it's not clear you'd often want to merge across those columns – it would be a very specific use case.

This is something more general again. I wonder how often people will have a "square" of data that they want to turn into a single cell.

My second comment is, the function is long and complex. I can see it being hard to maintain... and huxtable has a lot of hard-to-maintain code already, written by yours truly :-) ....

Overall then, I'm not convinced of this. But I do appreciate the effort. A simpler merge_repeated_cols (or columns? Hmm...) might have a better chance of getting in. Another project I'm thinking about is something to take a "grouping" column and turn it into a header row running across the top of the group. (I think the gt package has something similar.)

A few minor points:

use assertthat to check arguments on your way in.

rows and columns should accept arguments in huxtable style, see merge_repeated_rows for one way to do this.

Why would it matter whether you merge columns or rows first?

If rows or cols are outside the range of the huxtable, I'd throw an error rather than trying to guess the user's intention. get_rc_spec will do that for you, again, see merge_repeated_rows.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughjonesd/huxtable/issues/156#issuecomment-640161543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AETTNP6CK4O3C4DTGTDGT3DRVMS2LANCNFSM4NV7LPCA .

I understand. I think this probably isn't useful for most guys. Having seen your example, I wonder whether running merge_repeated_rows() followed by merge_repeated_columns() (written along the same lines) would do what you need. You'd want to avoid huxtable complaining about overwriting existing multicolumn cells, which might be tricky. Incidentally, if you want to play this, t(merge_repeated_rows(t(hux))) is a quick hack for merge_repeated_columns().

Thank you for that last point. I wonder if I should fix it. Presumably errors are gonna turn up somewhere....

hughjonesd / huxtable

Merge all repeated cells (not just by row) #156

Still returns 4:8

Still returns 3:5

Still returns 4:8

Still returns 3:5