implement handling for other facet types

mozilla / elasticutils

[deprecated] A friendly chainable ElasticSearch interface for python

http://elasticutils.rtfd.org

BSD 3-Clause "New" or "Revised" License

243 stars 76 forks source link

implement handling for other facet types #73

Open willkg opened 12 years ago

willkg commented 12 years ago

Currently .facet() only does terms facet. ElasticSearch supports other facet types:

We want to use date histogram in Input, so we have a current need for implementing that. @rlr expressed a deep-seated yearning for histogram as well.

It'd be nice if ElasticUtils supported more facet types.

willkg commented 12 years ago

On IRC we discussed some alternatives:

First, add a type argument to .facet() that changes the facet type it's creating. Then we'd end up with code like this:

s.facet('fielda', 'fieldb')  # terms facet
s.facet('fielda', type='date_histogram', interval='days')  # date histogram facet

This is harder to document and we know at least "interval" is an argument that affects multiple facet types, but has different shapes of values. Ew.

Second, we could have a Facet class with subclasses like DateHistogram where you provide the relevant bits:

s.facet(DateHistogram('fielda', interval='days'))

This is easy to document. We have F, so there's some precedence for this shape of things.

Third, we could have different methods:

s.facet_date_histogram('fielda', interval='days')

This is easy to document. We do different filter types as different methods, so there's precedence for this, too.

willkg commented 12 years ago

Oops--the bit about having different filter types as different methods is bogus. We only have one .filter.

willkg commented 12 years ago

If we look at doing multiple facet types, then the second approach looks better. Then you could do something like this:

s.facet(TermsFacet('fielda', 'fieldb'), Histogram('fielda', interval=100), DateHistogram('fielda', interval='days'))

I don't know how often that comes up in practice, but it nicely mirrors the resulting ES output:

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10
            }
        },
        "histo1" : {
            "histogram" : {
                "field" : "field_name",
                "interval" : 100
            }
        },
        "histo2" : {
            "date_histogram" : {
                "field" : "field_name",
                "interval" : "day"
            }
        }
    }
}

Also, it's interesting to point out that the value of facets is a dict of facet names to facets.

Maybe we should make it like this instead?:

s.facet(facetname1=DateHistogram('fielda', interval='days'))

willkg commented 12 years ago

I like that last idea best so far. It's a little wordy, but it's closer to the ES api and it's explicit in important ways.

I'll try implementing that tomorrow morning (or late tonight--whichever has more free time) and see whether it tastes yucky or not.

willkg commented 11 years ago

The last comment is from 8 months ago. Since then, we reimplemented filters and queries in an extensible way. Facets should follow suit.

eire1130 commented 11 years ago

Continuing from #146

I agree it's not a great idea, if for no other reason than it gets away practices set forth in django.

I'm not sure where you guys are at in this discussion, but I could see this going a could of different ways:

1: Using something like Django's aggregation framework, so something like from elasticutils.facets import Term, Histogram, Statistical, etc

And then from there do something like, S().query(lol='cats').facets(Statistical("field", options))

Or, have an options meta class that is passed in when declaring a facet, so something like this:

options = Options(options)

`S().query(lol='cats').facets(field__statistical=options)

I probably prefer the former to the later, keeps it closer to django and easier for new devs to pick up I think.

The thing is, different facet types have different options available. Those options should get exposed via the API and enforced as well.

Thoughts?