Closed zchenmr closed 6 months ago
R Maelstrom: madshapR v1.0.4.1005
Values are now grouped together regardless of capitalization, but there are still duplicate values due to special symbols. I'm not sure if these should be differentiated or not - I guess in some cases the accent could change the meaning of the word? Not sure how common that would be though.
Thank you for your contribution ! I would suggest that gather upper and lower case, but leave accents separated is the expected behaviour. Indeed, in French, many word have different meaning with or without accents, regardless the case. Lets say in a hospital :
"un interne tue" "un interné tue" "un interne tué" "un interné tué"
are four complete different stories :)
Hello @zchenmr. I added a tiny change in this topic :
library(tidyr)
library(madshapR)
dataset = tibble(iris %>%
mutate(Species = c(rep("Setosa",50),rep("SETOSA",50),rep("setosa",50))) %>%
mutate(var = c(rep("Aa",25),rep("aA",85),rep("aa",40))))
when a variable is declared as a group, then the case stays, along with its data dictionary declaration ("Setosa" and "SETOSA" are different) when a variable is declared as a category, then the case stays, along with its data dictionary declaration ("aa" and "AA" are different) when a variable is declared as a text (in any case, not as category) then the case is lowered, to avoid duplicated entries.
In a nutshell : a category has its case kept, a text has not.
Based on that,
variable_visualize(
dataset = dataset, # var is a text
col = 'var',
group_by = "Species")
variable_visualize(
dataset = dataset %>% mutate(var = as_category(var)), # var is a category
col = 'var',
group_by = "Species")
I hope that fits both the need you highlight without changing the data dictionary declaration, if declared.
That sounds great, thanks!
The
dataset_visualize()
report generates bar plots for open-text variables that sometimes show the same value (with differing capitalization) multiple times.