Open ericemc3 opened 3 years ago
santoku is a very featureful package for R that tries to improve on cut
. I don't think that Arquero needs this many features, but there could be some API inspiration there, in addition to d3 and other patterns in the JavaScript world.
I'd be happy to consider a new cut
/ chop
/ etc implementation for inclusion in Arquero. Similar to recode
it might be added as a new standard op function.
As for clustering algorithms, I think those might be more fitting as extensions defined in a separate package, as discussed in #67.
Great, thanks!
A simple implementation for an op.cut could just be: consider for instance breaks = [t1, t2, t3] recode x with: x ∈ [min, t1[ => 0 x ∈ [t1, t2[ => 1 x ∈ [t2, t3[ => 2 x ∈ [t3, max] => 3
An extension to op.ntile() could prove useful to encode numeric values to categories from manual breaks. Something similar to the R
cut
function:dens_code = cut( pop_density, breaks = c(0, 1000, 5000,20000,100000, Inf)...)
ord3.scaleThreshold()
jenks() and kmeans() are also useful clustering methods, we can borrow them from the
simple statistics library
, but of course if they were in Arquero it would be convenient.