insightsengineering / teal.data

Data model for teal applications
https://insightsengineering.github.io/teal.data/
Other
8 stars 7 forks source link

Handle both named and unnamed inputs in `col_labels` #298

Closed vedhav closed 7 months ago

vedhav commented 7 months ago

Fixes the bug reported here

Why is this behavior?

library(scda)
library(scda.2022)
library(formatters)
ADSL <- synthetic_cdisc_data("latest")$adsl
print(teal.data::col_labels(ADSL)) # This produces a valid column labels
var_labels(ADSL) <- teal.data::col_labels(ADSL)
print(teal.data::col_labels(ADSL)) # After the var_labels modifies the labels it's invalid now

App code to reproduce the error:

library(teal.modules.general)
data <- within(teal_data(), {
  library(scda)
  library(scda.2022)
  library(formatters)
  ADSL <- synthetic_cdisc_data("latest")$adsl
  adsl_labels <- teal.data::col_labels(ADSL)
  var_labels(ADSL) <- adsl_labels
})
datanames(data) <- "ADSL"

app <- teal::init(
  data = data,
  modules = tm_variable_browser(label = "Variable Browser")
)

shinyApp(app$ui, app$server)
github-actions[bot] commented 7 months ago

badge

Code Coverage Summary

Filename                         Stmts    Miss  Cover    Missing
-----------------------------  -------  ------  -------  --------------------
R/cdisc_data.R                       1       0  100.00%
R/deprecated.R                      57      57  0.00%    19-344
R/dummy_function.R                   2       2  0.00%    14-15
R/formatters_var_labels.R           38      11  71.05%   62, 71-82
R/join_key.R                        38       0  100.00%
R/join_keys-c.R                     12       0  100.00%
R/join_keys-extract.R              128       0  100.00%
R/join_keys-names.R                 15       0  100.00%
R/join_keys-parents.R               30       0  100.00%
R/join_keys-print.R                 45       0  100.00%
R/join_keys-utils.R                 73       3  95.89%   35-38
R/join_keys.R                       21       0  100.00%
R/teal_data-class.R                 25       1  96.00%   69
R/teal_data-datanames.R             10       0  100.00%
R/teal_data-get_code.R              14       0  100.00%
R/teal_data-show.R                   4       4  0.00%    14-19
R/teal_data.R                       30      16  46.67%   34, 37-43, 53-59, 62
R/testhat-helpers.R                 26       0  100.00%
R/topological_sort.R                32       0  100.00%
R/utils-get_code_dependency.R      184       1  99.46%   275
R/verify.R                          42      11  73.81%   63, 93-97, 100-104
TOTAL                              827     106  87.18%

Diff against main

Filename                     Stmts    Miss  Cover
-------------------------  -------  ------  -------
R/formatters_var_labels.R       +2       0  +1.61%
TOTAL                           +2       0  +0.03%

Results for commit: 3cf849427e743ccd6b717a5ea5efbb9367e8f6c4

Minimum allowed coverage is 80%

:recycle: This comment has been updated with latest results

github-actions[bot] commented 7 months ago

Unit Tests Summary

  1 files   14 suites   1s :stopwatch: 177 tests 175 :white_check_mark: 2 :zzz: 0 :x: 247 runs  245 :white_check_mark: 2 :zzz: 0 :x:

Results for commit 3cf84942.

chlebowa commented 7 months ago

Hold on.

Why is formatters::var_labels<- used in the problematic code and not teal.data::col_labels<-?

> library(scda)
> library(teal.data)
> ADSL <- synthetic_cdisc_data("latest")$adsl
> pre <- col_labels(ADSL)
> col_labels(ADSL) <- col_labels(ADSL)
> post <- col_labels(ADSL)
> identical(pre, post)
[1] TRUE
chlebowa commented 7 months ago

No error here:

library(teal.modules.general)
data <- within(teal_data(), {
  library(scda)
  library(scda.2022)
  library(formatters)
  ADSL <- synthetic_cdisc_data("latest")$adsl
  adsl_labels <- teal.data::col_labels(ADSL)
  teal.data::col_labels(ADSL) <- adsl_labels
})
datanames(data) <- "ADSL"

app <- teal::init(
  data = data,
  modules = tm_variable_browser(label = "Variable Browser")
)

runApp(app, launch.browser = TRUE)

image

vedhav commented 7 months ago

Can you show the sessonInfo? I believe it's outdated. Here is mine:

R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] C/UTF-8/C/C/C/C

time zone: Asia/Kolkata
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] teal.data_0.5.0.9000             testthat_3.2.1                   formatters_0.5.5.9009
 [4] scda.2022_0.1.5                  scda_0.1.6                       teal.modules.general_0.2.16.9020
 [7] teal.transform_0.4.0.9017        magrittr_2.0.3                   teal_0.15.0.9000
[10] teal.slice_0.5.0.9001            teal.code_0.5.0.9003             shinyTree_0.3.1
[13] shiny_1.8.0                      ggmosaic_0.3.3                   ggplot2_3.4.4

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0         viridisLite_0.4.2        farver_2.1.1             dplyr_1.1.4              fastmap_1.1.1
 [6] lazyeval_0.2.2           promises_1.2.1           shinyjs_2.1.0            digest_0.6.34            mime_0.12
[11] lifecycle_1.0.4          ellipsis_0.3.2           compiler_4.3.2           rlang_1.1.3              sass_0.4.8
[16] tools_4.3.2              utf8_1.2.4               yaml_2.3.8               data.table_1.15.0        knitr_1.45
[21] labeling_0.4.3           htmlwidgets_1.6.4        pkgbuild_1.4.3           pkgload_1.3.4            miniUI_0.1.1.1
[26] withr_3.0.0              purrr_1.0.2              shinyWidgets_0.8.1       desc_1.4.3               grid_4.3.2
[31] fansi_1.0.6              teal.logger_0.1.3.9011   urlchecker_1.0.1         profvis_0.3.8            xtable_1.8-4
[36] colorspace_2.1-0         scales_1.3.0             cli_3.6.2                rmarkdown_2.25           ragg_1.2.7
[41] generics_0.1.3           remotes_2.4.2.1          rstudioapi_0.15.0        httr_1.4.7               sessioninfo_1.2.2
[46] cachem_1.0.8             stringr_1.5.1            formatR_1.14             vctrs_0.6.5              devtools_2.4.5
[51] jsonlite_1.8.8           ggrepel_0.9.5            systemfonts_1.0.5        crosstalk_1.2.1          teal.widgets_0.4.2.9003
[56] plotly_4.10.4            fontawesome_0.5.2        tidyr_1.3.1              jquerylib_0.1.4          glue_1.7.0
[61] DT_0.31                  stringi_1.8.3            gtable_0.3.4             later_1.3.2              shinycssloaders_1.0.0
[66] munsell_0.5.0            tibble_3.2.1             logger_0.2.2             pillar_1.9.0             htmltools_0.5.7
[71] brio_1.1.4               sparkline_2.0            R6_2.5.1                 textshaping_0.3.7        rprojroot_2.0.4
[76] evaluate_0.23            backports_1.4.1          memoise_2.0.1            teal.reporter_0.2.1.9014 httpuv_1.6.14
[81] bslib_0.6.1              Rcpp_1.0.12              checkmate_2.3.1          xfun_0.42                forcats_1.0.0
[86] fs_1.6.3                 usethis_2.2.2            pkgconfig_2.0.3
vedhav commented 7 months ago

Why is formatters::var_labels<- used in the problematic code and not teal.data::col_labels<-?

I went through this rabbit hole and concluded that most examples that we have do not provide labeled column so things work fine. But, when we start adding labels that's when the problem occurs. It is the combination of using teal.data::col_labels and formatters::var_labels. I felt more convinced to handle this in teal.data because giving people the ability to pass named and unnamed column labels seemed right to me.

chlebowa commented 7 months ago

Why is formatters::var_labels<- used in the problematic code and not teal.data::col_labels<-?

I went through this rabbit hole and concluded that most examples that we have do not provide labeled column so things work fine. But, when we start adding labels that's when the problem occurs. It is the combination of using teal.data::col_labels and formatters::var_labels. I felt more convinced to handle this in teal.data because giving people the ability to pass named and unnamed column labels seemed right to me.

The var_labels family was moved to teal.data to remove dependency on formatters in order to avoid CI/CD issues. The functions were rewritten recently. I don't recall when the names were changed (var_labels -> col_labels).

I see no reason to use the formatters functions explicitly and it is in val_labels<- where the issue arises. Labels should not be a named vector. Using col_labels consistently eliminates the problem.

chlebowa commented 7 months ago

Here's my sessionInfo, I haven't updated anything in two weeks or so.

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0 
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Europe/Prague
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] formatters_0.5.5.9006            scda.2022_0.1.5.9004             scda_0.1.6.9014                  teal.modules.general_0.2.16.9018
 [5] teal.transform_0.4.0.9015        magrittr_2.0.3                   teal_0.14.0.9042                 teal.slice_0.4.0.9039           
 [9] teal.data_0.4.0.9001             teal.code_0.5.0                  shinyTree_0.3.1                  shiny_1.8.0                     
[13] ggmosaic_0.3.3                   ggplot2_3.4.4                   

loaded via a namespace (and not attached):
 [1] gtable_0.3.4             sparkline_2.0            xfun_0.41                bslib_0.6.1              shinyjs_2.1.0            htmlwidgets_1.6.4       
 [7] teal.widgets_0.4.2.9001  ggrepel_0.9.4            crosstalk_1.2.1          vctrs_0.6.5              tools_4.3.2              generics_0.1.3          
[13] tibble_3.2.1             fansi_1.0.6              pkgconfig_2.0.3          data.table_1.14.10       checkmate_2.3.1          lifecycle_1.0.4         
[19] farver_2.1.1             compiler_4.3.2           stringr_1.5.1            textshaping_0.3.7        munsell_0.5.0            fontawesome_0.5.2       
[25] httpuv_1.6.13            shinyWidgets_0.8.0       htmltools_0.5.7          sass_0.4.8               yaml_2.3.8               lazyeval_0.2.2          
[31] plotly_4.10.3            later_1.3.2              pillar_1.9.0             jquerylib_0.1.4          tidyr_1.3.0              ellipsis_0.3.2          
[37] DT_0.31                  rsconnect_1.2.0          cachem_1.0.8             mime_0.12                tidyselect_1.2.0         digest_0.6.33           
[43] stringi_1.8.3            dplyr_1.1.4              purrr_1.0.2              labeling_0.4.3           forcats_1.0.0            fastmap_1.1.1           
[49] grid_4.3.2               colorspace_2.1-0         cli_3.6.2                logger_0.2.2             utf8_1.2.4               teal.reporter_0.2.1.9016
[55] withr_2.5.2              scales_1.3.0             promises_1.2.1           backports_1.4.1          rmarkdown_2.25           httr_1.4.7              
[61] ragg_1.2.7               evaluate_0.23            memoise_2.0.1            knitr_1.45               shinycssloaders_1.0.0    viridisLite_0.4.2       
[67] rlang_1.1.2              Rcpp_1.0.11              xtable_1.8-4             glue_1.6.2               formatR_1.14             rstudioapi_0.15.0       
[73] jsonlite_1.8.8           teal.logger_0.1.3.9010   R6_2.5.1                 systemfonts_1.0.5    
vedhav commented 7 months ago

The functions were rewritten recently.

Can you point me to the issue/PR related to this? I don't see any changes to the labels.R which is where I would expect any breaking change to be at.

vedhav commented 7 months ago

I see no reason to use the formatters functions explicitly and it is in val_labels<- where the issue arises. Labels should not be a named vector. Using col_labels consistently eliminates the problem.

Agree, the problem arises when we label using formatters::val_labels<-. To me, it still seems like the inability of teal.data::col_labels to work for labeled and unlabeled column names is a bug. Because the return from formatters::var_labels() is a named vector. So, I would argue for the fix in teal.data.

vedhav commented 7 months ago

@chlebowa Thanks for clarifying that this was caused due to improper usage of the formatters. I will change this in the teal.gallery.