insightsengineering / formatters

A framework for creating listings of raw data that include specialized formatting, headers, footers, referential footnotes, and pagination.
https://insightsengineering.github.io/formatters/
Other
15 stars 6 forks source link

Bug in export function when key_cols and disp_cols are identical #298

Open BFalquet opened 2 months ago

BFalquet commented 2 months ago

Reproducible example:

library(chevron)
lst <- as_listing(
    df = syn_data[["adcm"]],
    key_cols = c("ATC2", "CMDECOD", "CMTRT"),
    disp_cols = c("ATC2", "CMDECOD", "CMTRT"),
    unique_rows = TRUE
)

lst 
# ATC Level 2 Text   Standardized Medication Name   Reported Name of Drug, Med, or Therapy
# ————————————————————————————————————————————————————————————————————————————————————————
#    ATCCLAS2 A             medname A_1/3                           A_1/3                 
#                           medname A_2/3                           A_2/3                 
#                           medname A_3/3                           A_3/3                 
#  ATCCLAS2 A p2            medname A_3/3                           A_3/3                 
#    ATCCLAS2 B             medname B_1/4                           B_1/4                 
#                           medname B_2/4                           B_2/4                 
#                           medname B_3/4                           B_3/4                 
#                           medname B_4/4                           B_4/4                 
#  ATCCLAS2 B p2            medname B_1/4                           B_1/4                 
#                           medname B_2/4                           B_2/4                 
#  ATCCLAS2 B p3            medname B_1/4                           B_1/4                 
#                           medname B_2/4                           B_2/4                 
#    ATCCLAS2 C             medname C_1/2                           C_1/2                 
#                           medname C_2/2                           C_2/2                 
#  ATCCLAS2 C p2            medname C_1/2                           C_1/2                 
#                           medname C_2/2                           C_2/2                 
#  ATCCLAS2 C p3            medname C_2/2                           C_2/2 

formatters::export_as_txt(lst)

returns

Error in rep(0, length(r_colwidths) - nrepcols - 1) : 
  invalid 'times' argument

Either consider sending a warning upon listing creation or modify the export function.

cheers.

gmbecker commented 2 months ago

hmm, if all columns are key columns, then a duplicate row would print as a completely blank line. I'm not sure that's desirable behavior.

I'm tempted to say that a listing must have at least one column that isn't a key column, but I'd want input from people who make listings in practice before finalizing that

edelarua commented 2 months ago

@BFalquet is it possible to change the key columns in these cases? As Gabe mentioned above, the right-most column should never be a key column, since there are no records to the right of it for which to group by values of this column. Specific to this example, when unique_rows = TRUE there should be no difference in the contents of the listing when setting the last column as a key column vs. a non-key column.

For example, see that the following two listings are exactly the same:

library(chevron)
library(rlistings)

as_listing(
  df = syn_data[["adcm"]] %>% head(10),
  key_cols = c("ATC2", "CMDECOD", "CMTRT"),
  disp_cols = c("ATC2", "CMDECOD", "CMTRT"),
  unique_rows = TRUE
)
#> ATC Level 2 Text   Standardized Medication Name   Reported Name of Drug, Med, or Therapy
#> ————————————————————————————————————————————————————————————————————————————————————————
#>    ATCCLAS2 A             medname A_1/3                           A_1/3                 
#>                           medname A_3/3                           A_3/3                 
#>    ATCCLAS2 B             medname B_1/4                           B_1/4                 
#>                           medname B_2/4                           B_2/4                 
#>                           medname B_3/4                           B_3/4                 
#>    ATCCLAS2 C             medname C_1/2                           C_1/2

as_listing(
  df = syn_data[["adcm"]] %>% head(10),
  key_cols = c("ATC2", "CMDECOD"),
  disp_cols = c("ATC2", "CMDECOD", "CMTRT"),
  unique_rows = TRUE
)
#> ATC Level 2 Text   Standardized Medication Name   Reported Name of Drug, Med, or Therapy
#> ————————————————————————————————————————————————————————————————————————————————————————
#>    ATCCLAS2 A             medname A_1/3                           A_1/3                 
#>                           medname A_3/3                           A_3/3                 
#>    ATCCLAS2 B             medname B_1/4                           B_1/4                 
#>                           medname B_2/4                           B_2/4                 
#>                           medname B_3/4                           B_3/4                 
#>    ATCCLAS2 C             medname C_1/2                           C_1/2

Created on 2024-06-05 with reprex v2.1.0

But in the following listing which has all-key columns, identical rows are formatted as empty lines (and not removed since unique_rows = TRUE was not set):

library(chevron)
library(rlistings)

as_listing(
  df = syn_data[["adcm"]] %>% head(10),
  key_cols = c("ATC2", "CMDECOD", "CMTRT"),
  disp_cols = c("ATC2", "CMDECOD", "CMTRT")
)
#> ATC Level 2 Text   Standardized Medication Name   Reported Name of Drug, Med, or Therapy
#> ————————————————————————————————————————————————————————————————————————————————————————
#>    ATCCLAS2 A             medname A_1/3                           A_1/3                 
#>                           medname A_3/3                           A_3/3                 
#>    ATCCLAS2 B             medname B_1/4                           B_1/4                 
#>                           medname B_2/4                           B_2/4                 
#>                           medname B_3/4                           B_3/4                 
#>                                                                                         
#>                                                                                         
#>    ATCCLAS2 C             medname C_1/2                           C_1/2                 
#>                                                                                         
#> 

Created on 2024-06-05 with reprex v2.1.0

BFalquet commented 2 months ago

Hi @edelarua , I converged to the exact same solution which works fine, but requirements that are not enforced by assertions will eventually result in problems in production. It is not urgent, we can wait, but I think that ultimately, we have to either authorize this situation or send a warning.