ropensci / daiquiri

Data quality reporting for temporal datasets.
https://ropensci.github.io/daiquiri/
GNU General Public License v3.0
35 stars 2 forks source link

report fails when "NA" appears as a value in a strata field and is not designated as a 'missing' value #21

Open phuongquan opened 5 months ago

phuongquan commented 5 months ago

fails as heatmap tries to plot 2 rows labelled "NA"

testdf <- read_data(test_path("testdata", "completetestset.csv"))
daiqobj <- daiquiri::daiquiri_report(testdf,
                               field_types = field_types( col_timepoint_err = ft_ignore(),
                                                          col_timepoint = ft_timepoint(),
                                                          col_date_time_err = ft_ignore(),
                                                          col_date_time = ft_datetime(),
                                                          col_date_only_err = ft_ignore(),
                                                          col_date_only = ft_datetime(includes_time = FALSE),
                                                          col_date_uk_err = ft_ignore(),
                                                          col_date_uk = ft_datetime(includes_time = FALSE, format = "%d/%m/%Y"),
                                                          col_id_num_err = ft_ignore(),
                                                          col_id_num = ft_uniqueidentifier(),
                                                          col_id_string_err = ft_ignore(),
                                                          col_id_string = ft_uniqueidentifier(),
                                                          col_numeric_clean_err = ft_ignore(),
                                                          col_numeric_clean = ft_numeric(),
                                                          col_numeric_dirty_err = ft_ignore(),
                                                          col_numeric_dirty = ft_numeric(),
                                                          col_numeric_missing_err = ft_ignore(),
                                                          col_numeric_missing = ft_numeric(),
                                                          col_categorical_small_err = ft_ignore(),
                                                          col_categorical_small = ft_strata(),
                                                          col_categorical_large_err = ft_ignore(),
                                                          col_categorical_large = ft_categorical(),
                                                          col_freetext_err = ft_ignore(),
                                                          col_freetext = ft_freetext(),
                                                          col_simple_err = ft_ignore(),
                                                          col_simple = ft_ignore()),
                           override_column_names = FALSE,
                           na = c("","NULL"),
                           aggregation_timeunit = "day",
                           save_directory = "./devtesting/testoutput",
                           save_filename = NULL,
                           show_progress = TRUE)

processing file: report_htmldoc.Rmd
  |............................................................                                      |  61% [daiquiri-overview-strata]                 
Quitting from lines 278-300 [daiquiri-overview-strata] (report_htmldoc.Rmd)
Error in `levels<-`:
! factor level [9] is duplicated
Backtrace:
 1. daiquiri:::plot_subcat_heatmap_static(...)
 3. data.table::`[.data.table`(...)
      at daiquiri/R/reports.R:487:3
 4. base::eval(jsub, SDenv, parent.frame())
 5. base::eval(jsub, SDenv, parent.frame())
 6. base::factor(field_name, levels = agg_fun_subcat_value(heatmap_fields))