Open nsmith- opened 5 years ago
I thought about that—there are cases where you'd want to bin a space in a non-rectangular way. For instance, in x < 0
the y
bins are finely spaced, in 0 <= x < 1
, the y
bins are broadly spaced, and in 1 <= x
, the y
bins cover a different range, etc.
This can be expressed in the current framework as a combination of Histograms
and Collections
. Both of these have a list of Axis
that divide the space in a Cartesian product, but the Collections
also define a set of children that do not have to have the same Axis
list. That way, you could have a Collection
of three Histograms
: one finely binned in y
, filled with x < 0
, another widely binned in y
with 0 <= x < 1
, and another with y
bins in a different range for 1 <= x
. The non-rectangularness is expressible, though the user-facing library might call these a single histogram while Aghast calls it three Histograms
.
Ah, but in that case, you'd really prefer the children of the Collection
to be "named" with elements of a PredicateBinning
, rather than strings. Maybe I should add a sibling of Collection
that does that: instead of keying the things it contains with strings, it should key them with a binning. That would carry more semantic information.
I had been thinking about this, and although a sister to Collection
would as the functionality in a backwards-compatible way, it would be simpler (and a breaking change) to generalize the Collection
members from a string → objects mapping to a binning → objects, where the binning is usually CategoryBinning
. The case you want would be PredicateBinning
.
Since it's still the really days I'm going to change that. A lot of tests will need to be touched, but it will be worth it in the end.
The Axis
system would be like this:
Collection
has a sequence of Axis
that are the outermost Cartesian splits.Collection
lookup has a single Axis
that is a binning for its individually defined objects—one Histogram
definition for each bin.Histograms
have a sequence of Axis
that are the innermost Cartesian splits.That way, you can build arbitrary nestings of "ands" and "ors" for splitting, by nesting several layers of Collections
.
Particularly for
CategoryBinning
axes, it would be nice to only save a dense binning for the tuple of categories corresponding to a valid bin, rather than the entire product of category values per axis.