Open npaszty opened 8 months ago
I believe the level contains empty strings so it feels like it's missing. The table on the right side only displays the count stats for non-missing values. Also, the graph would indicate the missing values in a different color.
I have two examples to showcase what is going on:
SEX
library(teal.modules.general)
data <- teal_data()
data <- within(data, {
ADSL <- rADSL
ADSL$SEX[1:10] <- NA
ADTTE <- rADTTE
})
join_keys(data) <- default_cdisc_join_keys[datanames(data)]
app <- init(
data = data,
modules = tm_variable_browser(
label = "Variable browser"
)
)
shinyApp(app$ui, app$server)
SEX
(Presumably this is happening here)library(teal.modules.general)
data <- teal_data()
data <- within(data, {
ADSL <- rADSL
ADSL$SEX <- factor(ADSL$SEX, levels = c("F", "M", ""))
ADSL$SEX[1:10] <- ""
attr(ADSL$SEX, "label") <- "Sex"
ADTTE <- rADTTE
})
join_keys(data) <- default_cdisc_join_keys[datanames(data)]
app <- init(
data = data,
modules = tm_variable_browser(
label = "Variable browser"
)
)
shinyApp(app$ui, app$server)
I believe the level contains empty strings so it feels like it's missing.
Empty strings are not missing values. NAs are missing values.
I believe the level contains empty strings so it feels like it's missing.
Empty strings are not missing values. NAs are missing values.
Yes, that is my point as seen in the examples. Example 1 shows the missing values and example 2 shows how the empty string can cause this misunderstanding.
Also, in the case of variable browser module. Empty strings are considered as missing values and they are converted into NA They are only retained when they are a valid level for a factor variable.
Empty strings are considered as missing values
That would not be my guess so it sounds like a deliberate choice. What is your opinion as a user @npaszty?
@chlebowa
from our investigation it looks like !is.na() is used in this module to evaluate the counts (%)? in R we get that " " != NA in terms of a comparison but in practical terms they are the same: missing. meaning there is no value available.
we use the df_explicit_na() function to prep the clinical data for use in the teal modules but in this case it is not going to make an impact because of !is.na() used to evaluate the counts.
from a user perspective the observation of two opposing displays is not comforting. 0 (0%) on the one hand and 466 (42.87%) on the other hand. these diffs will undermine confidence in the tool. users don't know of or react to computing approaches behind the scenes and their associated nuances, they just see what they see. 😄
if the outcome display is the result of a deliberate choice then I guess I would be curious to understand the rationale behind the choice.
@npaszty Would you like to have the following behavior?
<Missing>
pak::pak("insightsengineering/teal.modules.general@697-impute-empty-values-as-na@main")
library(teal.modules.general)
data <- teal_data()
data <- within(data, {
ADSL <- rADSL
ADSL$SEX <- factor(ADSL$SEX, levels = c("F", "M", ""))
ADSL$SEX[1:10] <- ""
attr(ADSL$SEX, "label") <- "Sex"
ADSL$RACE <- as.character(ADSL$RACE)
ADSL$RACE[1:20] <- ""
attr(ADSL$RACE, "label") <- "Race"
ADTTE <- rADTTE
})
join_keys(data) <- default_cdisc_join_keys[datanames(data)]
app <- init(
data = data,
modules = tm_variable_browser(
label = "Variable browser"
)
)
shinyApp(app$ui, app$server)
P.S You can try this fix in your app by installing the tmg using this command:
pak::pak("insightsengineering/teal.modules.general@697-impute-empty-values-as-na@main")
@vedhav
not sure when an empty value would not be considered missing in practical terms and that's what I'm pointing out here. the screen shot looks like the df_explicit_na() function was applied with defaults and both the left and right hand missing counts/% are the same. that's what I would expect.
provided there isn't a broad impact that the core dev team would be able to identify then yes, the change you made displays the counts the way I would expect. thanks!
What happened?
the "Missing" column count in the left hand table does not match the "Missing" record count in the table underneath the bar chart.
sessionInfo()
Relevant log output
Code of Conduct
Contribution Guidelines
Security Policy