toptal / chewy

High-level Elasticsearch Ruby framework based on the official elasticsearch-ruby client
MIT License
1.89k stars 368 forks source link

Gradually building non-boolean queries with new DSL (e.g. function_score) #548

Open chrisandreae opened 7 years ago

chrisandreae commented 7 years ago

The documentation for the new query DSL states

A lot of query-level methods were not ported: everything that is related to boost and scoring. Use query manipulation to provide them.

Is it possible to use the new query manipulation primitives to add features that require potentially wrapping the existing query, such as function scoring? For example, in the legacy DSL I could build a query from the inside out:

q1 = FooIndex.query(match: { language: "en"}) # {:query=>{:match=>{:language=>"en"}}
q2 = q1.decay(:gauss, :date, scale: '365d') # {:query=>{:function_score=>{:functions=>[{:gauss=>{:story_date=>{:scale=>"365d"}}}], :query=>{:match=>{:language=>"en"}}}}

Whereas the naive expression in the new language simply combines individual clauses into a boolean query:

q1 = FooIndex.query { match language: "en" } # {:query=>{:match=>{:language=>"en"}}
q2 = q1.query { function_score { functions << {:gauss=>{:story_date=>{:scale=>"365d", :offset=>"31d", :decay=>0.5}}} } }
# {:query=>{:bool=>{:must=>[{:match=>{:language=>"en"}}, {:function_score=>{:functions=>[{:gauss=>{:story_date=>{:scale=>"365d"}}}]}}]}}}

I haven't yet been able to find anything to suggest that query components can be extracted and further manipulated once they're in Chewy::Search::Parameters::QueryStorage. What do you recommend in this situation?

pyromaniac commented 7 years ago

Hey, sorry, I missed this issue somehow. You are making a good point here. Probably it worth to reimplement this DSL. For now, I don't see any other way but to have a single .query {} with a block. I understand this is far from the functionality that was provided before. Now I see that it was actually convenient.

sandydoo commented 7 years ago

I recently had a great experience refactoring our custom bool query builder to use the new bool query DSL. Thank you @pyromaniac 👏 ! The only thing missing was a way to wrap the entire query in a function_score and add some functions to it.

Solution

I implemented a basic solution to support this here, sandydoo/chewy, and wanted to share my thoughts. The idea is that you pass in your functions using function_score and they get pushed to a storage array. When the query is rendered, everything gets wrapped in a function_score, queries and filters are properly rendered inside and the stored array is inserted as functions.

SampleIndex.query(match: {...}).function_score(gauss: {...})
{
  body:
    query: {
      function_score: {
        query: {...},
        functions: [
          {
            gauss: {...}
          }
        ]
      }
    }
  }
}

Why not redo the old DSL?

Now I realize that the naming choice function_score is questionable/debatable, but hear me out.

My reservations with reimplementing the old DSL is that the syntax for it is very different from the rest of the DSL. Using the OP's example, Index.decay(:gauss, :date, scale: '365d'), this seems like an unnecessary level of abstraction. Anyone new to chewy would then have to learn this DSL as well and then spend more time understanding what it's going to generate. In my opinion, Chewy excels at helping compose complex queries out of valid, individual, smaller queries. In keeping with the style of the rest of the DSL, I think it helps to be explicit and just pass gauss: { date: { scale: ... } } or use the elastic ruby DSL. Having a single way of adding functions means there is less chewy-specific syntax to remember and you can write your queries/functions in the same overall structure that Elastic expects them.

I also don't feel like there is that much value in having methods like decay. These are all just functions. I wouldn't bother adding special methods for each type of function, much in the same way I wouldn't bother adding methods for every single filter/query type 😄. In the end, this results in a simpler DSL that is easier to maintain and is less susceptible to upstream changes in Elastic's query format.

You would still have to support the top level stuff, like score_mode, but that's easy to do and there aren't that many of them. I already did score_mode for example, since I needed it.

Would love to hear some thoughts on this, both positive and negative. It should be easy to PR with some added tests.

pyromaniac commented 7 years ago

Great thoughts, thanks! I was going to do something similar. Having a top-level function_score which will wrap the built query and some satellite methods on a lower level like function_score.score_mode(). Thanks for your input, I really appreciate it.

Roriz commented 6 years ago

Hi guys, Has any update?

Thanks for all ✨

pyromaniac commented 6 years ago

Not yet, I didn't start working on this yet unfortunately :( You can take over this initiative if you want.

dleve123 commented 6 years ago

@pyromaniac

Out of curiosity, Is this issue still a priority for you at all

In the absence of a fix like the one @sandydoo mentions above, what would be the best solution path to implement the functionality of the decay method on Elasticsearch 5?

pyromaniac commented 6 years ago

Well, I still don't have time to work on Chewy but I plan to have it sooner or later and then everything will be addressed.

dleve123 commented 6 years ago

@pyromaniac gotcha and I'm happy to help contribute here! In the meantime, is there a straightforward way to drop down a layer of abstraction to elasticsearch-ruby from Chewy to manually specify this function_score?

pyromaniac commented 6 years ago

I was thinking about it at least, we can pass a block there but still will not get chainable. It requires a deep investigation I believe.

cyucelen commented 1 year ago

Any updates to this? I need a way to nest an already built query into function_score query