jurismarches / luqum

A lucene query parser generating ElasticSearch queries and more !
Other
188 stars 40 forks source link

Remove "type": "phrase" from `match` queries #19

Closed davidlmorton closed 6 years ago

davidlmorton commented 6 years ago

I don't expect you'd like to take this PR as-is. I would like you to consider making the choice between match and match_phrase queries configurable somehow. For my use-cases the match_phrase query isn't appropriate, so I just removed it. I think match_phrase should be the default (as it is the only choice now), with the ability to configure behavior (ideally per field/subfield).

Also, "type": "phrase" is deprecated by newer versions of ES. I get a warning like this when I try to execute a query built with "type": "phrase" in it:

#! Deprecation: Deprecated field [type] used, replaced by [match_phrase and match_phrase_prefix query]
alexgarel commented 6 years ago

hello, thanks for the PR. Right I have very little time, but I will look into this asap.

NidzaKornjaca commented 6 years ago

Any updates on this?

alexgarel commented 6 years ago

hello @NidzaKornjaca, I'll try to look into this. If you have some insight on this, I would be glad to know.

NidzaKornjaca commented 6 years ago

If I figure something out I'll let you know. I might be able to spend some time on this next week.

On 6 Mar 2018 3:07 pm, "Alex Garel" notifications@github.com wrote:

hello @NidzaKornjaca https://github.com/nidzakornjaca, I'll try to look into this. If you have some insight on this, I would be glad to know.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/jurismarches/luqum/pull/19#issuecomment-370792060, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ8tcVbJLYg9W7l9somLRVfWPmQ0KRiHks5tbpgngaJpZM4QtoV3 .

alexgarel commented 6 years ago

@davidmorton, could you please provide your use case? I'd like to understand the best way to make this configurable.

Should we provide a dict mapping field names to matching query we want, with a default one ? (like match_type={"message": "match_phrase", "author": "match_prefix", "__default__": "match"} ?

Ok also to fix the "deprecation warning", this is easy.

DavidMorton commented 6 years ago

@davidlmorton I believe this was directed to you. :smile:

davidlmorton commented 6 years ago

Sure, my use case was with building an application to search for demographic records in a database that were the same person but separate records. The problem is difficult because the person may have moved (changing address), or married (changing name) and many siblings (especially twins) share many fields yet are not the same person. I needed match instead of match phrase because the order in which tokens occur in the fields were unimportant to us. Also, we wanted a partial match to return in our results. Name for instance, some people sometimes go by their middle name, so first and middle name often get swapped around. It may be possible to achieve this by specifying a slop parameter, but I haven't much experience with it.

Hope that helps!

alexgarel commented 6 years ago

Hello,

With commit 7794367936920987f12d5a78c5f63401d849d7be and release 0.7.1

@davidlmorton I hope this works well with your usecase.