square / crossfilter

Fast n-dimensional filtering and grouping of records.
https://square.github.com/crossfilter/
Other
6.22k stars 1.31k forks source link

Group issue #156

Closed elmart closed 9 years ago

elmart commented 9 years ago

Given an ordinal dimension over my data, I want to display an histogram to see how values are distributed. So, first, I use a group (with value function and reduce function being the defaults, this is, identity and count)

          var firstGrouping = dimension.group();

With that, I have a list of groups representing distinct values and their count. But I don't want to depict all groups. Instead, I only want, say, the biggest 10, and everything else summed up in an extra group, named "other". I thought I could do this with a second group like this:

              grouping = (dimension.group(function (d) {
                            var groups = firstGrouping.top(n);
                            for (var i = 0; i < groups.length; i++) {
                              var group = groups[i];
                              if (d == group.key) {
                                return d;
                              }
                            }
                            return '(other)';
                          })

But, for some reason, this second group seems not to work. Group '(other)' is not there, even if I'm returning '(other)' for some values.

Anybody can explain, please?

jasondavies commented 9 years ago

Seems to work fine for me, perhaps you should post a reproducible example somewhere? Personally I think it would be simpler to use firstGrouping.top(Infinity) and regroup small groups into "(other)" yourself.

elmart commented 9 years ago

Seems to work fine for me

Mmm, strange. It definitely doesn't for me.

Personally I think it would be simpler to use firstGrouping.top(Infinity) and regroup small groups into "(other)" yourself.

I considered that, but this is part of a generic widget which depict groups generated by the passed grouping function. I wouldn't like to special-case that for my particular case.

perhaps you should post a reproducible example somewhere?

I'll try to post something executable, but I need some time to do it. In the meantime, I have another question. From group() function doc:

The groupValue function is optional; if not specified, it defaults to the identity 
function. Like the value function, groupValue must return a naturally-ordered
value; furthermore, this order must be consistent with the dimension's value function!

I suspect my problem could be related with that last requirement (consistency between groupValue function and dimension value function). Could you expand on what would that mean? I'm interpreting it as a <= b --> g(a) <= g(b), where a = v(recordA) and b = v(recordB) where v is dimension value function and g is group value function. That doesn't hold in my case, precisely because of group "(other)". Is that really needed? Why?

Thanks.

jasondavies commented 9 years ago

Ah, sorry about that, perhaps I didn’t test on a large enough sample to reproduce. You’re right: the group order must be consistent with the dimension’s value order. This is primarily to allow fast updating of groups internally. It’s a bit restrictive, but I think it’s best to treat Crossfilter as quite a low-level API rather than a flexible high-level API. In your case, this is why it would be better to regroup into “(other)” outside of Crossfilter.

elmart commented 9 years ago

Ok. I ended up doing as you adviced (making second grouping out of crossfilter). Thanks.