jurismarches / luqum

A lucene query parser generating ElasticSearch queries and more !
Other
187 stars 42 forks source link

Question: treating untagged words or phrases as "full text search" across multiple (or all) fields #39

Open seandavi opened 5 years ago

seandavi commented 5 years ago

Luqum is working great for me and my test users, but one thing that the test users miss is the behavior of query_string to do a full-text search across all fields when no field is specified (eg., "London") . I see the ability to specify a default fields, but this results in a simple match query. I guess I am looking to convert these to multi-match with all available text fields? Any suggestions?

alexgarel commented 5 years ago

Hi @seandavi

You're right this is not a supported scenario, but it is an interesting one.

Two solutions:

If you help in some way, just ask !

seandavi commented 5 years ago

For the time being, I'm going the cheap route and specifying _all as the default field for the match query for now. Users seem happy with the basic query_string behavior which appears to pretty much use the _all approach.

If I have a little time, I may play with the multi-match approach. If I get into trouble, I'll let you know.

As usual, thanks for taking the time to answer and clarify.

seandavi commented 5 years ago

I know it has been a while on this one. I noticed a per-field version of multi_match was recently implemented. I'd like to revisit the idea of multi_match on a set of default fields for bare words. I like your idea of converting to multi_match when default_field is a list. Could you give me some hints on where to focus if I want to implement? No urgency, but I thought I would ask.

seandavi commented 5 years ago

Just leaving a note here that to do this right would involve bare Word() and Phrase(), the latter requiring a different multi_match type.

seandavi commented 5 years ago

After a little playing with luqum.utils.LuceneTreeTransformer, this seems to do what I need. Note that multi_match is roughly translated to a bunch of OR queries across single-field match. The same is true of multi_match with phrases, except that match_phrase

class BareTextTransformer(luqum.utils.LuceneTreeTransformer):
    """Convert bare Words or Phrases to full text search

    In cases where a query string has bare text (no field
    association), we want to construct a DSL query that includes
    all fields in an OR configuration to perform the full
    text search against all fields. 
    This class can walk the tree and convert bare Word 
    nodes into the required set of SearchField objects. Note 
    that this is entirely equivalent to `multi_match` in terms
    of performance, etc. 
    """
    def __init__(self, fields=['title','abstract']):
        """Create a new BareTextTransformer
        Parameters
        ----------
        fields: list of str
            This is the list of fields that will used to 
            create the composite SearchField objects that
            will be OR'ed together to simulate full text
            search.

        Returns
        -------
        None. The tree is modified in place.
        """
        super()
        self.fields = fields

    def visit_word(self, node, parent):
        if(len(parent)>0 and (
                isinstance(parent[-1], luqum.tree.SearchField) or
                isinstance(parent[-1], luqum.tree.Range))):
            return node
        else:
            search_list = [SearchField(f, node) for f in self.fields]
            return Group(OrOperation(*search_list))

    def visit_phrase(self, node, parent):
        if(len(parent)>0 and (
                isinstance(parent[-1], luqum.tree.SearchField) or
                isinstance(parent[-1], luqum.tree.Range))):
            return node
        else:
            search_list = [SearchField(f, node) for f in self.fields]
            return Group(OrOperation(*search_list))

And, to use:

  tree = parser.parse(q)
  transformer = BareTextTransformer()
# tree below now has expanded Group(OrOperations....) for each
# field in the BareTextTransformer `fields`
  tree = transformer.visit(tree)
thpica commented 3 years ago

Using a multi_match for the * field seems to work for me.

es_query_builder = ElasticsearchQueryBuilder(
    **schema_analyzer.query_builder_options(),
    field_options={"*": {"match_type": "multi_match"}},
)