tidyverse / funs

Collection of low-level functions for working with vctrs
Other
34 stars 7 forks source link

cut helpers #4

Open hadley opened 8 years ago

hadley commented 8 years ago

Extract from ggplot2

romainfrancois commented 3 years ago
library(ggplot2)

table(cut_interval(1:100, 10))
#> 
#>    [1,10.9] (10.9,20.8] (20.8,30.7] (30.7,40.6] (40.6,50.5] (50.5,60.4] 
#>          10          10          10          10          10          10 
#> (60.4,70.3] (70.3,80.2] (80.2,90.1]  (90.1,100] 
#>          10          10          10          10
table(cut_interval(1:100, 11))
#> 
#>   [1,10]  (10,19]  (19,28]  (28,37]  (37,46]  (46,55]  (55,64]  (64,73] 
#>       10        9        9        9        9        9        9        9 
#>  (73,82]  (82,91] (91,100] 
#>        9        9        9

table(cut_number(runif(1000), 10))
#> 
#> [0.000144,0.105]    (0.105,0.204]    (0.204,0.297]    (0.297,0.392] 
#>              100              100              100              100 
#>    (0.392,0.492]    (0.492,0.589]    (0.589,0.703]    (0.703,0.804] 
#>              100              100              100              100 
#>    (0.804,0.905]    (0.905,0.999] 
#>              100              100

table(cut_width(runif(1000), 0.1))
#> 
#> [-0.05,0.05]  (0.05,0.15]  (0.15,0.25]  (0.25,0.35]  (0.35,0.45]  (0.45,0.55] 
#>           47           94           90          103          102          105 
#>  (0.55,0.65]  (0.65,0.75]  (0.75,0.85]  (0.85,0.95]  (0.95,1.05] 
#>          111           90          119           88           51
table(cut_width(runif(1000), 0.1, boundary = 0))
#> 
#>   [0,0.1] (0.1,0.2] (0.2,0.3] (0.3,0.4] (0.4,0.5] (0.5,0.6] (0.6,0.7] (0.7,0.8] 
#>        95        99       117       108        90        97       102        99 
#> (0.8,0.9]   (0.9,1] 
#>        78       115
table(cut_width(runif(1000), 0.1, center = 0))
#> 
#> [-0.05,0.05]  (0.05,0.15]  (0.15,0.25]  (0.25,0.35]  (0.35,0.45]  (0.45,0.55] 
#>           62          104           85          106          100           85 
#>  (0.55,0.65]  (0.65,0.75]  (0.75,0.85]  (0.85,0.95]  (0.95,1.05] 
#>           93           91          116           99           59
table(cut_width(runif(1000), 0.1, labels = FALSE))
#> 
#>   1   2   3   4   5   6   7   8   9  10  11 
#>  43 112  89  94  95  88 116 101 105 108  49

Created on 2021-05-05 by the reprex package (v2.0.0)

DavisVaughan commented 3 years ago

People also seem to enjoy https://hughjonesd.github.io/santoku/index.html

yutannihilation commented 3 years ago

As the success of santoku suggests, it's nice if the cut helper generates nice labels for discrete-ish data (e.g. integer, Date).

One more feature I want on cut_number() is auto-retrying of calculation. It often fails with count data, which tends to be zero inflated.

library(ggplot2)

set.seed(403)
x <- c(rep(0, 100), rpois(100, 2))

cut_number(x, n = 3)
#> Error: Insufficient data values to produce 3 bins.

Created on 2021-05-05 by the reprex package (v2.0.0)

This drove me to create a dedicated package for this... It would be great if funs provides such a great function!

c.f. https://github.com/yutannihilation/cutnumberint#example