insightsengineering / teal.modules.general

General Purpose Teal Modules
https://insightsengineering.github.io/teal.modules.general/
Other
9 stars 13 forks source link

View Variables: Left hand vs. bar plot table "Missing" counts don't match #697

Open npaszty opened 8 months ago

npaszty commented 8 months ago

What happened?

the "Missing" column count in the left hand table does not match the "Missing" record count in the table underneath the bar chart.

image

sessionInfo()

running on https://connect.apollo.roche.com/connect/#/apps/7bb9648c-75ab-4075-a021-d587f175a42e/access

Relevant log output

2024/02/26 10:39:29 AM: [rsc-session] Content GUID: 7bb9648c-75ab-4075-a021-d587f175a42e
2024/02/26 10:39:29 AM: [rsc-session] Content ID: 6254
2024/02/26 10:39:29 AM: [rsc-session] Bundle ID: 103266
2024/02/26 10:39:29 AM: [rsc-session] Job Key: P5B9obyFNn4t2ks2
2024/02/26 10:39:29 AM: Running on host: ppdpaa0838-euc1.aws.science.roche.com
2024/02/26 10:39:29 AM: Linux distribution: Ubuntu 22.04.3 LTS (jammy)
2024/02/26 10:39:29 AM: Running as user: uid=999(rstudio-connect) gid=998(rstudio-connect) groups=998(rstudio-connect)
2024/02/26 10:39:29 AM: Connect version: 2023.09.0
2024/02/26 10:39:29 AM: LANG: C.UTF-8
2024/02/26 10:39:29 AM: Working directory: /opt/rstudio-connect/mnt/app
2024/02/26 10:39:29 AM: Using R 4.2.2
2024/02/26 10:39:29 AM: R.home(): /opt/R/4.2.2/lib/R
2024/02/26 10:39:29 AM: Content will use associated Packrat library
2024/02/26 10:39:29 AM: Adding Packrat library to R_LIBS and .libPaths: /opt/rstudio-connect/mnt/app/packrat/lib/x86_64-pc-linux-gnu/4.2.2
2024/02/26 10:39:29 AM: R_LIBS: /opt/rstudio-connect/mnt/app/packrat/lib/x86_64-pc-linux-gnu/4.2.2
2024/02/26 10:39:29 AM: .libPaths(): /opt/rstudio-connect/mnt/app/packrat/lib/x86_64-pc-linux-gnu/4.2.2, /opt/R/4.2.2/lib/R/library
2024/02/26 10:39:29 AM: shiny version: 1.8.0
2024/02/26 10:39:29 AM: httpuv version: 1.6.13
2024/02/26 10:39:29 AM: rmarkdown version: 2.25
2024/02/26 10:39:29 AM: knitr version: 1.45
2024/02/26 10:39:29 AM: jsonlite version: 1.8.8
2024/02/26 10:39:29 AM: RJSONIO version: (none)
2024/02/26 10:39:29 AM: htmltools version: 0.5.7
2024/02/26 10:39:29 AM: reticulate version: (none)
2024/02/26 10:39:29 AM: Using pandoc: /opt/rstudio-connect/ext/pandoc/2.16
2024/02/26 10:39:30 AM: Using Shiny bookmarking base directory /opt/rstudio-connect/mnt/bookmarks
2024/02/26 10:39:30 AM:
2024/02/26 10:39:30 AM: Starting R with process ID: '3512042'
2024/02/26 10:39:30 AM: Shiny application starting ...
2024/02/26 10:39:34 AM: There are no scda.XXXX libraries installed, like scda.2022.
2024/02/26 10:39:34 AM: Please install an scda database to take full advantage of the scda package.
2024/02/26 10:39:34 AM: Visit https://insightsengineering.github.io/scda.2022/ for details on scda.2022 and how it can be installed.
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM: Attaching package: ‘shinydashboard’
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM: The following object is masked from ‘package:graphics’:
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM:     box
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM: Loading required package: goshawk
2024/02/26 10:39:34 AM: Loading required package: dplyr
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM: Attaching package: ‘dplyr’
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM: The following objects are masked from ‘package:stats’:
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM:     filter, lag
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM: The following objects are masked from ‘package:base’:
2024/02/26 10:39:34 AM:
2024/02/26 10:39:34 AM:     intersect, setdiff, setequal, union
2024/02/26 10:39:34 AM:
2024/02/26 10:39:35 AM: Loading required package: teal
2024/02/26 10:39:35 AM: Loading required package: teal.data
2024/02/26 10:39:35 AM: Loading required package: teal.slice
2024/02/26 10:39:35 AM: Loading required package: teal.transform
2024/02/26 10:39:35 AM:
2024/02/26 10:39:35 AM: You are using teal version 0.14.0
2024/02/26 10:39:35 AM:
2024/02/26 10:39:35 AM: Attaching package: ‘teal’
2024/02/26 10:39:35 AM:
2024/02/26 10:39:35 AM: The following objects are masked from ‘package:teal.slice’:
2024/02/26 10:39:35 AM:
2024/02/26 10:39:35 AM:     as.teal_slices, teal_slices
2024/02/26 10:39:35 AM:
2024/02/26 10:39:35 AM: Loading required package: tern
2024/02/26 10:39:35 AM: Loading required package: rtables
2024/02/26 10:39:35 AM: Loading required package: formatters
2024/02/26 10:39:36 AM:
2024/02/26 10:39:36 AM: Attaching package: ‘rtables’
2024/02/26 10:39:36 AM:
2024/02/26 10:39:36 AM: The following object is masked from ‘package:utils’:
2024/02/26 10:39:36 AM:
2024/02/26 10:39:36 AM:     str
2024/02/26 10:39:36 AM:
2024/02/26 10:39:36 AM: Registered S3 method overwritten by 'tern':
2024/02/26 10:39:36 AM:   method   from 
2024/02/26 10:39:36 AM:   tidy.glm broom
2024/02/26 10:39:36 AM:
2024/02/26 10:39:36 AM: Attaching package: ‘tern’
2024/02/26 10:39:36 AM:
2024/02/26 10:39:36 AM: The following object is masked from ‘package:goshawk’:
2024/02/26 10:39:36 AM:
2024/02/26 10:39:36 AM:     g_lineplot
2024/02/26 10:39:36 AM:
2024/02/26 10:39:37 AM: Loading required package: ggmosaic
2024/02/26 10:39:37 AM: Loading required package: ggplot2
2024/02/26 10:39:37 AM: Loading required package: shinyTree
2024/02/26 10:40:00 AM: Warning: There was 1 warning in `mutate()`.
2024/02/26 10:40:00 AM: ℹ In argument: `AVISITCDN = case_when(...)`.
2024/02/26 10:40:00 AM: Caused by warning:
2024/02/26 10:40:00 AM: ! NAs introduced by coercion
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.5900 pid:3512042 token:[] teal.modules.general Initializing tm_variable_browser
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.6283 pid:3512042 token:[] teal.modules.general Initializing tm_data_table
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.6357 pid:3512042 token:[] teal.modules.clinical Initializing tm_t_summary
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.6488 pid:3512042 token:[] teal.modules.clinical Initializing tm_t_events_summary
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.6683 pid:3512042 token:[] teal.modules.clinical Initializing tm_t_events
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.6794 pid:3512042 token:[] teal.modules.clinical Initializing tm_t_events_by_grade
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.6928 pid:3512042 token:[] teal.modules.clinical Initializing tm_t_smq
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.7090 pid:3512042 token:[] teal.goshawk Initializing tm_g_gh_boxplot
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.7155 pid:3512042 token:[] teal.goshawk Initializing tm_g_gh_correlationplot
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.7218 pid:3512042 token:[] teal.goshawk Initializing tm_g_gh_density_distribution_plot
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.7272 pid:3512042 token:[] teal.goshawk Initializing tm_g_gh_lineplot
2024/02/26 10:40:10 AM: [INFO] 2024-02-26 10:40:10.7336 pid:3512042 token:[] teal.goshawk Initializing tm_g_gh_spaghettiplot
2024/02/26 10:40:11 AM:
2024/02/26 10:40:11 AM: Listening on http://127.0.0.1:42987
2024/02/26 10:40:40 AM: Warning in length(token_data) > 0 && !is.na(token_data) :
2024/02/26 10:40:40 AM:   'length(x) = 16 > 1' in coercion to 'logical(1)'
2024/02/26 10:40:45 AM: Warning in length(token_data) > 0 && !is.na(token_data) :
2024/02/26 10:40:45 AM:   'length(x) = 16 > 1' in coercion to 'logical(1)'
2024/02/26 10:40:45 AM: module "Report previewer" server function takes no data so "datanames" will be ignored
2024/02/26 10:40:47 AM: ✔ Writing to "app-usage".
2024/02/26 10:40:47 AM: ✔ Appending 1 row to 'Data'.
2024/02/26 10:44:50 AM: [rsc-session] Received signal: interrupt
2024/02/26 10:44:50 AM: [rsc-session] Terminating subprocess with interrupt ...
2024/02/26 10:44:50 AM:
2024/02/26 10:44:50 AM:
2024/02/26 10:44:50 AM: Shiny application exiting ...
2024/02/26 10:44:50 AM: Execution halted
2024/02/26 10:44:50 AM: [rsc-session] Terminated subprocess with signal: interrupt

Code of Conduct

Contribution Guidelines

Security Policy

vedhav commented 8 months ago

I believe the level contains empty strings so it feels like it's missing. The table on the right side only displays the count stats for non-missing values. Also, the graph would indicate the missing values in a different color.

I have two examples to showcase what is going on:

1. Example with missing values in the column SEX

Screenshot 2024-02-28 at 3 13 22 PM
library(teal.modules.general)

data <- teal_data()
data <- within(data, {
  ADSL <- rADSL
  ADSL$SEX[1:10] <- NA
  ADTTE <- rADTTE
})
join_keys(data) <- default_cdisc_join_keys[datanames(data)]

app <- init(
  data = data,
  modules = tm_variable_browser(
    label = "Variable browser"
  )
)

shinyApp(app$ui, app$server)

2. Example with an empty factor level in the column SEX (Presumably this is happening here)

Screenshot 2024-02-28 at 3 17 48 PM
library(teal.modules.general)

data <- teal_data()
data <- within(data, {
  ADSL <- rADSL
  ADSL$SEX <- factor(ADSL$SEX, levels = c("F", "M", ""))
  ADSL$SEX[1:10] <- ""
  attr(ADSL$SEX, "label") <- "Sex"
  ADTTE <- rADTTE
})
join_keys(data) <- default_cdisc_join_keys[datanames(data)]

app <- init(
  data = data,
  modules = tm_variable_browser(
    label = "Variable browser"
  )
)

shinyApp(app$ui, app$server)
chlebowa commented 8 months ago

I believe the level contains empty strings so it feels like it's missing.

Empty strings are not missing values. NAs are missing values.

vedhav commented 8 months ago

I believe the level contains empty strings so it feels like it's missing.

Empty strings are not missing values. NAs are missing values.

Yes, that is my point as seen in the examples. Example 1 shows the missing values and example 2 shows how the empty string can cause this misunderstanding.

vedhav commented 8 months ago

Also, in the case of variable browser module. Empty strings are considered as missing values and they are converted into NA They are only retained when they are a valid level for a factor variable.

chlebowa commented 8 months ago

Empty strings are considered as missing values

That would not be my guess so it sounds like a deliberate choice. What is your opinion as a user @npaszty?

npaszty commented 8 months ago

@chlebowa

from our investigation it looks like !is.na() is used in this module to evaluate the counts (%)? in R we get that " " != NA in terms of a comparison but in practical terms they are the same: missing. meaning there is no value available.

we use the df_explicit_na() function to prep the clinical data for use in the teal modules but in this case it is not going to make an impact because of !is.na() used to evaluate the counts.

from a user perspective the observation of two opposing displays is not comforting. 0 (0%) on the one hand and 466 (42.87%) on the other hand. these diffs will undermine confidence in the tool. users don't know of or react to computing approaches behind the scenes and their associated nuances, they just see what they see. 😄

if the outcome display is the result of a deliberate choice then I guess I would be curious to understand the rationale behind the choice.

vedhav commented 8 months ago

@npaszty Would you like to have the following behavior?

  1. Empty values are also considered as missing. And, are denoted by <Missing>
  2. Always display the missing values in the right side table.

The proposed change will look like this, with the code to reproduce the output

Screenshot 2024-02-29 at 2 02 26 AM
pak::pak("insightsengineering/teal.modules.general@697-impute-empty-values-as-na@main")
library(teal.modules.general)

data <- teal_data()
data <- within(data, {
  ADSL <- rADSL
  ADSL$SEX <- factor(ADSL$SEX, levels = c("F", "M", ""))
  ADSL$SEX[1:10] <- ""
  attr(ADSL$SEX, "label") <- "Sex"

  ADSL$RACE <- as.character(ADSL$RACE)
  ADSL$RACE[1:20] <- ""
  attr(ADSL$RACE, "label") <- "Race"
  ADTTE <- rADTTE
})
join_keys(data) <- default_cdisc_join_keys[datanames(data)]

app <- init(
  data = data,
  modules = tm_variable_browser(
    label = "Variable browser"
  )
)

shinyApp(app$ui, app$server)

P.S You can try this fix in your app by installing the tmg using this command:

pak::pak("insightsengineering/teal.modules.general@697-impute-empty-values-as-na@main")
npaszty commented 8 months ago

@vedhav

not sure when an empty value would not be considered missing in practical terms and that's what I'm pointing out here. the screen shot looks like the df_explicit_na() function was applied with defaults and both the left and right hand missing counts/% are the same. that's what I would expect.

provided there isn't a broad impact that the core dev team would be able to identify then yes, the change you made displays the counts the way I would expect. thanks!