jthomasmock / gtExtras

A Collection of Helper Functions for the gt Package.
https://jthomasmock.github.io/gtExtras/
Other
193 stars 26 forks source link

Warning + Error: Binary variable with `gt_plt_dist()` #110

Closed vincentarelbundock closed 5 months ago

vincentarelbundock commented 10 months ago

Hi, thanks for a great package!

I get a warning and an error when trying to draw a histogram for a binary variable, and I can’t figure out what the problem is. I’d be very grateful if you could give me a hint. Minimal example:

library(gt)
library(gtExtras)
library(dplyr)

dat <- read.csv(
  "https://vincentarelbundock.github.io/Rdatasets/csv/causaldata/thornton_hiv.csv",
  na.strings = c("", "-1"))  |>
  summarize(
    mean = mean(tinc, na.rm = TRUE),
    hist_data = list(hiv2004))

gt(dat) |> gtExtras::gt_plt_dist(hist_data, type = "histogram")
# Warning: Computation failed in `stat_bin()`
# Caused by error in `bin_breaks_width()`:
# ! `binwidth` must be positive

The data look fine to me:

str(dat[[2]][[1]])
#  int [1:4820] 0 NA 0 NA 0 0 NA NA NA NA ...
table(dat[[2]][[1]])
# 
#    0    1 
# 2695  185

Initially reported here:

https://github.com/vincentarelbundock/modelsummary/issues/680

jthomasmock commented 7 months ago

Howdy @vincentarelbundock ! I use an estimator for binwidth to give decent binwidth options, as taken from Rob Hyndman.

bw <- 2 * stats::IQR(no_na, na.rm = TRUE) / length(data_in)^(1 / 3)

In the case of the data you have provided:

stats::IQR(dat$hist_data[[1]], na.rm = TRUE)
#> [1] 0

So the binwidth is reporting a 0. I can get around that by hard-coding the bw like so:

gt(dat) |> gtExtras::gt_plt_dist(hist_data, type = "histogram", bw = 1)

image

jthomasmock commented 7 months ago

I'm going to re-open, as I think I have a solution for bw <= 0 as seen in #104

# conditional to switch between estimated binwidth or Freedman–Diaconis rule
      {
        if(bw > 0){
          geom_histogram(color = "white", fill = "#f8bb87", binwidth = bw)
        } else {

          hist_breaks <- hist(col[!is.na(col)], breaks = "FD")$breaks

          geom_histogram(color = "white", fill = "#f8bb87", breaks = hist_breaks)
        }
      } +
jthomasmock commented 5 months ago

This is likely closed by: https://github.com/jthomasmock/gtExtras/commit/639d68ab9c61f6a35d7ce91a9d5a084468fb8470