mozilla / elasticutils

[deprecated] A friendly chainable ElasticSearch interface for python
http://elasticutils.rtfd.org
BSD 3-Clause "New" or "Revised" License
243 stars 76 forks source link

Added minimum_should_match support to terms query. #257

Open patrick91 opened 10 years ago

patrick91 commented 10 years ago

@willkg let me know what do you think about this

willkg commented 10 years ago

I'm hesitant to continue going down the "If there's one thing, then it means this... if there are two things then it means this..." path because that's not very flexible.

Alternative options are like this:

  1. send a dict as the value so we have more obvious name/value args rather than order-dependent args
  2. create a Value class or something like that which reduces the syntactic junk but keeps it readable
  3. have "shorthand and ElasticUtils builds it" or "Python dict of Elasticsearch stuff" options which mean we don't have to go implement our own parsing for all the possible values

Here are some examples:

Option 1: sending a dict

S().query(foo__terms={'minimum_should_match': 2, 'value': ['car', 'duck']})

Option 2: Some Value class

S().query(foo__terms=['car', 'duck'])
S().query(foo__terms=Value(minimum_should_match=2, value=['car', 'duck']))

Option 3: shorthand or "you're doing the ES stuff yourself"

# shorthand and simple case
S().query(foo__terms=['car', 'duck']) 

# advanced
S().query(foo__terms={'minimum_should_match': 2, 'value': ['car', 'duck']})

Option 2 and 3 are similarish, but Option 3 is more flexible since it handles keys that don't have to be Python identifiers.

Option 3 has the advantage that ElasticUtils could see it and then just pass it along. So it supports all Elasticsearch possibilities and we don't have to do anything extra.

There's another library called elasticsearch-dsl-py. That has some similar things with ElasticUtils, however, it only allows you to specify one single query/filter per .query()/.filter() call. Because of that, things are a little easier to deal with API-wise.

Anyhow, I think I'd rather go with Option 3. It solves some other problems, too. Then users get some good-enough defaults with simple shorthands and if they outgrow that, they can start using Elasticsearch stuff without us having to do work to support things.

What do others thing?

patrick91 commented 10 years ago

About the option two, I don't know if it is a good idea to have a TermsQuery (which extends Q) class dedicated to Terms query.

S().query(foo__terms=['car', 'duck'])
S().query(TermsQuery(values=['car', 'duck'], minimum_should_match=2))

The Value class looks too generic to me.

willkg commented 10 years ago

If we go down the "let's write a class for everything Elasticsearch does", that's a lot of classes and a lot of work to support things. That's why I was thinking a generic option is better.

patrick91 commented 10 years ago

So you'd like to have a Value class which basically gets converted to a JSON structure?

willkg commented 10 years ago

I think I prefer option 3, but I'm interested in seeing other options and discussing pros and cons.

patrick91 commented 10 years ago

Ok :) I'm happy to fix my PR once we got a nice way to deal with this