insightsengineering / teal.modules.general

General Purpose Teal Modules
https://insightsengineering.github.io/teal.modules.general/
Other
9 stars 13 forks source link

Labels in biplot of tm_a_pca become hard to interpret when number of variables in the analysis exceeds ~20 #50

Open cicdguy opened 3 years ago

cicdguy commented 3 years ago

Environment: NEST_UAT_10_12

Sample code:

# ADSL example
library(random.cdisc.data)
library(teal.modules.general)
ADSL <- radsl(cached = TRUE)

for (i in seq.int(0, 100)) {
  name <- paste0("col_", as.character(i))
  set.seed(i)
  ADSL[[name]] <- runif(400)
}

var_names <- lapply(seq.int(0, 100), function(x) { paste0("col_", as.character(x)) })

app <- teal::init(
  data = cdisc_data(cdisc_dataset("ADSL", ADSL),
                    code = "ADSL <- radsl(cached = TRUE)
                    for (i in seq.int(0, 100)) {
                      name <- paste0('col_', as.character(i))
                      set.seed(i)
                      ADSL[[name]] <- runif(400)
                    }", check = TRUE),
  modules = root_modules(
    tm_a_pca("PCA",
             data_extract_spec(
               dataname = "ADSL",
               select = select_spec(
                 choices = variable_choices(data = ADSL),
                 selected = unlist(var_names),
                 multiple = TRUE
               ),
               filter = NULL
             )
    )
  )
)

shinyApp(app$ui, app$server)

So what happens is this:

user/3166/files/e70d0d80-0d4c-11eb-87d5-751e3a439861) When the number of variables increases, this gets worse to the point of being user/3166/files/7b777000-0d4d-11eb-81bd-a49fcd84431a). I don't know if there is a point in performing PCA on more than 20 variables in the context of the analysis we aim to service, so I guess it might be a non-issue. Provenance: ``` Creator: kpagacz ```
gogonzo commented 2 years ago

https://cran.r-project.org/web/packages/ggrepel/