Closed homerlex closed 8 years ago
Sure, just set up dimensions on origin, destination, or route. For example, for flights from PHX:
var origin = flight.dimension(function(d) { return d.origin; });
origin.filterExact("PHX");
Or for flights from PHX to ONT:
var route = flight.dimension(function(d) { return d.origin + "-" + d.destination; });
route.filterExact("PHX-ONT");
@mbostock do you have a recommendation for filtering on a non-continuous range (e.g. PHX and SMF but not SAN)? It seems like it'd be possible to do by assigning a dimension to each airport code, but I'm wondering if there's a Better Way.
Crossfilter only supports filtering contiguous ranges at the moment. For categorical dimensions (such as airport codes) I think it would make sense to implement a different type of filter can toggle arbitrary values rather than recording a contiguous range. So, I would fix that by adding a new feature. :)
Noted. In the meantime, this has worked well for me so far:
originPHX = crossfilter.dimension (function(d) { return d.origin == 'PHX' });
//...
originPHX.filter(true); //use false to get all flights were the origin isn't PHX
This snippet seems quite susceptible to generalization, assuming there aren't performance concerns...
But I thought there were performance concerns. There are limits to the number of dimensions and making dimensions, according to the documentation is expensive.
I was building a sizable piece of code on D3 and when I saw that crossfilter did a lot of stuff that I was building in my own data management code, I switched over, but this issue is really hitting me now. I've been trying to figure out how to get this functionality, but it does look impossible without adding the feature to the software as Mike suggested, and that looks too hard for me to try.
In my code there's going to be lots of turning on and off of various values of various categorical dimensions, so I probably made a mistake trying to use crossfilter, though I've learned a bunch by playing with it.
The first part to better support categorical dimensions is deciding on an API, so you might consider that even if you don't feel comfortable tackling the implementation.
I think the first decision is whether we want to support this as "dimensions can have multiple filters" (perhaps that can be intersected or unioned), or as "dimensions can be either quantitative or ordinal", in which case the filters on an ordinal dimension are tracked as a set of discrete values, rather than a contiguous range.
Just to check that I'm not missing something important: you're using the word ordinal now rather than categorical. I guess all categorical dimensions can be considered ordinal by putting them in, e.g., alphabetical order. If there are implications beyond that for the word choice, I'm not catching them.
It may be uncommon but certainly not impossible that someone will want multiple filters on a quantitative dimension, so the idea of allowing those to be intersected or unioned is nice. But your suggestion above for filters that allow toggling of individual values is very appealing. I don't have a clear sense of the performance implications.
One of my use cases is that I'd like to perform some calculation on the values of dimension X for all combinations of specific values for dimensions A, B and C, and allowing this to happen quickly as the user sets filters on dimensions B, C and D. Right now it looks to me like I have to remember the filters on B, C and D (where each filter can have multiple values) while temporarily setting single-value filters on all the combinations of A, B and C.
Is that kind of use case something that you'd like crossfilter to be able to support? I'll think more about what might be a nice API and report back later.
I guess all categorical dimensions can be considered ordinal by putting them in, e.g., alphabetical order.
Yep, that's all I meant.
Did that use case make sense? Is it something you'd want to support? On Mar 31, 2012 10:12 PM, "Mike Bostock" < reply@reply.github.com> wrote:
I guess all categorical dimensions can be considered ordinal by putting them in, e.g., alphabetical order.
Yep, that's all I meant.
Reply to this email directly or view it on GitHub: https://github.com/square/crossfilter/issues/13#issuecomment-4863417
As an API consumer, I'm inclined to vote for Option B (ordinal dimensions) because I don't see a clean way to represent the union and intersection operations for Option A (multi-filter) without some sort of domain-specific query language or array messiness, particularly when the operations are mixed. It does seem possible--albeit awkard--to synthesize Option B's behavior with Option A by taking the union of a set of ranges that match the discrete boundaries plus or minus some tolerance.
I must confess that I don't follow the description of @Sigfried's use case, so I can't say which option would fit that case better.
I think option B (dimensions as ordinal/categorical) would have more practical usage than option A (multiple filters on dimension).
While I'm sure there are cases where you'd want to apply multiple filters on a dimension (e.g. lunch hour and dinner hour on a time of day dimension), I think filtering by particular categorical values is a much more common use case. Using the payments metaphor in the tutorial, I'd imagine a very common use case for a merchant would be to filter payments by multiple zip codes, states, or credit card types.
Just throwing this out there, but I was imagining an API of an optional is_categorical flag on the dimension, while using the existing filter() APIs.
The existing API already supports categorical dimensions, provided you only need a single exact match (use filterExact). This issue is about allowing multiple values to be selected. We could enable that specifically for categorical dimensions, in which case the filter API would allow you to get or set multiple selected values. Or, we could figure out how to do it more generally for both categorical and quantitative dimensions. My guess is that enabling a different filter API for categorical dimensions would be less work and more convenient for the common use case. But, the general solution might be more powerful.
I'm taking a stab at this right now. Leaning toward the generic solution, but we'll see. I don't think we need support for intersections? The result of an intersection is going to be something that can be passed directly to filterRange today. Also, if we could enable a way to clone a dimension, you can perform intersections by applying the different filters to each dimension copy.
For the union of multiple filters, I'm just going to work on letting filter() take multiple arguments.
Zackham,
I've been testing your branch and I've noticed something strange to me, but perhaps this is the proper behavior? In your test you filter by the "total" dim, and then you get the data through the "date" dim. This gives you the right answer. But if you check through the "total" dim you get the wrong answer, it is still filtered by the first variable only and not the union. Is this by design?
beefsoup,
Thanks for the second set of eyes. I was not expanding the hi0/lo0 range to include the additional ranges. This is fixed now and I also modified the test.
Hi Mike,
I'm not sure the best way to email you, but trying this.
I was wondering if you'd be interested in/willing to have a brief phone conversation about the future of visualization frameworks built on top of D3? My perspective is that I'm at a firm that does a lot of contracting work on an impressive array of scientific and administrative projects for the NIH, FDA, and other organizations in the public health and clinical science arenas, and I'm trying to build something general on top of D3 to allow us to navigate a wide range of disparate data and incorporate these visualizations into web apps.
I'm working on some ideas, which, a few weeks ago, led me to take up and then abandon Tesseract/Crossfilter as my way of managing and filtering data sets. The approach that I'm taking would probably seem pretty ugly to you (it seems ugly to me quite often): OOP class hierarchies of UI elements and data elements that allow me from the perspective of any piece of data to access methods relating to how it wants to be displayed (colored, sized, etc.), whether it's been filtered, who its parents and children are; also things like: when a chunk of data results from the intersection of two dimension values, and it's display is partly based on methods related to a third dimension value, the thing figures out what to do where.
To some degree, I think what I'm trying to make (unfortunately, without sufficient experience and background), is an API to let myself and others make Spotfire/Tableau-like visualizations from RDBMS data. So the initial hierarchy of any of this data (for purposes of letting users assign columns to visualization dimensions and stuff) is: table name (or query name) --> column name --> column value --> result subset. Clearly in Crossfilter you're coming up with ways of addressing some of the same general issues.
Your data models in D3 and Crossfilter are nice and flat and clean and tie the data so closely to the visualization that all the logic about colors and inter-data-point calculations can be performed directly in the visualization code. That works well for making beautiful individual visualizations. In my case where I want to make more of a dashboard thing, with various visualizations all tied to the same or related underlying data, I think there are aspects of the data and its interrelationships that need help from classes and methods that cross visualization boundaries.
Anyway, I thought a conversation might be fruitful. What do you think?
Thanks, Sigfried
Given that
On Sat, Mar 31, 2012 at 8:48 PM, Mike Bostock < reply@reply.github.com
wrote:
The first part to better support categorical dimensions is deciding on an API, so you might consider that even if you don't feel comfortable tackling the implementation.
I think the first decision is whether we want to support this as "dimensions can have multiple filters" (perhaps that can be intersected or unioned), or as "dimensions can be either quantitative or ordinal", in which case the filters on an ordinal dimension are tracked as a set of discrete values, rather than a contiguous range.
Reply to this email directly or view it on GitHub: https://github.com/square/crossfilter/issues/13#issuecomment-4863071
Sigfried Gold
C: 301-202-4556 H: 301-920-0530 www.sigfried.org
Is there any update on this? Seems like a must have feature, but no successful way of implementing it or a workaround online.
At the moment the only real workaround is to use dimension.filterFunction.
Hello jason, would this filter be still active if I apply group().reduceCount() on the filtered dimension? Ex.g.
var XDimension = ndx.dimension(function (d) {return d.Name})
.filterFunction(function (d) {return d==="Allyssa" || d==="Bob";})
YDimension = XDimesnion.group().reduceCount(function(d) {return d.Name;});
...
dc.renderAll();
Here is my stackoverflow question for reference.
Thanks.
Hello, I'm strugling with multifiltering issue as well as I'm trying to build a more general spatial filter on my data. I'm trying to use the filterFunction but it is strange to me that it is triggered as many times as the total number of records even if there are just few unique dimension values. Is this a bug, or is there any workaround for that? What I'm trying to do is to implement a 'point in polygon' filter based on dimension derived from Z-curve ordering. Thanks for any ideas.
@jezekjan this makes filterFunction(f)
evaluate f
once per unique dimension value. (It still loops over all records, though.)
@@ -821,11 +821,13 @@ function crossfilter() {
var i,
k,
x,
+ v = values.length && values[0];
added = [],
removed = [];
for (i = 0; i < n; ++i) {
- if (!(filters[k = index[i]] & one) ^ !!(x = f(values[i], i))) {
+ if (values[i] !== v) x = f((v = values[i]), i);
+ if (!(filters[k = index[i]] & one) ^ !!x) {
if (x) filters[k] &= zero, added.push(k);
else filters[k] |= one, removed.push(k);
}
updated after #129
As discussed in #151 an active fork is being developed in a new Crossfilter Organization. Please take further discussion there (if you haven't already) where it should be warmly welcomed by the new maintainers. Cheers!
I'm looking at the example on http://square.github.com/tesseract/ and thinking of different ways I'd like to be able to filter the data. Is there any possible way to filter on the flight cities? Let's say I want to just see data for flights from PHX to ONT. Or perhaps all flights that have a destination of SAN.
Of course we could filter the data that is returned from the server but since I already have all the data loaded client side it would be nice to be able to do this type of filtering client site.
Any thoughts/ideas on this?