Closed huntfx closed 5 years ago
Hello @Peter92.
I'm not sure of what you are trying to do. Should your request become an elasticsearch request, or do you just want to get the parsing tree ?
However changing operator precedence seems the right thing to do in your case.
To change OR to AND for implicit operator, you can use the default_operator
parameter in ElasticsearchQueryBuilder
.
If come back with some code, and the error or incorrect behaviour you get, I may be able to help.
Ah thanks, I'm just using the tree as I'm combining this with SQLAlchemy
, I figured it'd be easier to recursively iterate over the children and build up the query. I'm just wanting the tree to build itself a little differently, so that an OR operator will only apply to the two closest values.
Quick example again similar to above, but with the tree:
>>> parser.parse('a OR b c d') # the same as ((a OR b) AND c AND d)
#output: OrOperation(Word('a'), AndOperation(Word('b'), Word('c'), Word('d')))
#wanted: AndOperation(OrOperation(Word('a'), Word('b')), Word('c'), Word('d'))
Am I able to apply your suggestion just on the tree, or is that only for the elastic search thing?
I found a minor bug btw, parser.parse("'word'")
evaluates as Word("word'")
.
No, the implicit operation resolves to UnknownOperation
.
In [6]: parser.parse('a OR b c d')
Out[6]: OrOperation(Word('a'), UnknownOperation(Word('b'), Word('c'), Word('d')))
So you can either :
utils.py
although implementation of TreeTransformer is not the best)..And thanks for the small bug report !
It was a slightly incomplete example as I was calling UnknownOperationResolver(AndOperator)(tree)
, ignoring that though haha, the tree result I would like then is this:
UnknownOperation(OrOperation(Word('a'), Word('b')), Word('c'), Word('d')))
By default, a OR b c d
is being parsed as a OR (b c d)
, but I would like it parsed as (a OR b) c d
.
Actually I think I just got it working. Moving the ('left', 'OR_OP',)
line to the bottom of luqum.parser.precedence
seems to have done what I wanted, from very light testing it doesn't seem to have broken anything.
I know it's not a common use case, but perhaps it'd be quite nice if there was some option to modify the order without editing the source code.
You can do this in your own file:
from luqum.parser import *
precedence = list(precedence)
or_op = precedence.pop(0)
precedence.append(or_op)
precedence = tuple(precedence)
myparser = yacc.yacc()
tree = myparser.parse('a OR b c d')
print(tree, "is", repr(tree))
This outputs:
a OR b c d is UnknownOperation(OrOperation(Word('a'), Word('b')), Word('c'), Word('d'))
NB: you can't do it in REPL because yacc.yacc() needs to load the file.
PS: indeed, putting or_op in second position does not work if you use implicit operator, but would work if operations where AND. this is because, it as to be considered after the "TERM" operation, as "b c d" only contains TERM.
Got it working perfectly with SQLAlchemy
now, thanks a bunch for coding this.
I just have a final quick question so I'll ask here instead of opening a new issue.
Would it be possible modify the parsing to allow empty search terms, so that "description: name:"
for example will return something like UnknownOperation(SearchField('description', None), SearchField('name': None))
?
I tried editing p_field_search
but it doesn't get that far before throwing an error.
In current grammar, foo:bar:(test) is valid and interpreted as SearchField('spam', SearchField('foo', Word('bar')))
(this permits complex nested or object search).
So to support what you want, you would have to rewrite a good part of the grammar.
"TERM COLUMN unary_expression" rule should not stand any-more, as you want to put restrictions on the right part (it can't be a search). So now you have to differentiate unary_expression
and a new search_expression
.
Also because space are not considered, luqum can't differentiate "name: a" from "name:a". While I imagine you want the first to be "name:None OR a" and the second "name:a".
In Lucene, to make a match on every object with a non empty "name", it is written "name:*".
If you just want to tweak the product, I would suggest using a regexp like so:
marker = "*" # use something you know want be used by user
empty_search = re.compile(r'(?<expr>{term_re}:)\s+'.format(term_re=TERM_RE))
expression = empty_search.sub("\g<expr>" + marker, expression)
Then when you meet marker
you can use None
.
Ah thanks for the suggestions, I didn't realise it'd be quite a big tweak.
I couldn't get the gammar edits to work (I'm running a bit blind with ply and don't really understand what counts as valid syntax for it), however based your regex idea, I've got it working by just checking the string for any occurance of :
that isn't contained in speech marks, and adding in a null marker.
Other than for nested search fields which I hadn't considered before, It's now fully hooked up to the database and works perfectly, so thanks again :)
Short version is I would like to parse
a b OR c d
asa AND (b OR c) AND d
.Long version: I'm attempting to simplify the syntax a little before I integrate this into my code, as it's supposed to be an easy to use search with optional advanced features. I gave up coding it myself and found this library which seems nice.
I'd like it to work like Google where things are "AND" by default, but by providing "OR", it'll compare the two closest values. I've set
UnknownOperationResolver
to AND and tried changing the order ofparser.precedence
, but not had any luck.Here's an example of a query I'd like to use:
Working syntax: (either page 1 or page 2, has either "a" or both "b" and "c", title is not sometitle)
Wanted syntax:
For the record, we're still stuck on Python 2.7 for another year or two, so as I've already had to fix the
yield from
lines, I'm not against tweaking other bits of the code if needed.