healthonnet / hon-lucene-synonyms

Solr query parser plugin that performs proper query-time synonym expansion.
http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr
150 stars 67 forks source link

When this query parser finds synonyms, it needs the longest match. #25

Open jhsuh opened 11 years ago

jhsuh commented 11 years ago

I insert the synonyms for dog just like below. When I search "dog", I want to search "dog" or "man's best friend" or "dog(inc)" and it works perpectly. When I search "dog(inc), I want to search "dog(inc)" or "dog" or "man's best friend" too. But this query parser finds synonyms for "dog(inc)" and "dog" also(maybe uses the shortest match). And It searches ("doc" and "inc") or ("doc's synonyms" and "inc").

hmm.... I think the search query has to be the longest matched in the synonym_edismax query parser.

# tokenizer 
 query : StatndardTokenizer
 synonym : StatndardTokenizer 
 dog(inc) -> dog inc
# synynoyms.txt
 dog, man's best friend, dog(inc)
# search phrase 
 search query : dog ==> OK
http://127.0.0.1:8983/solr/select?qf=Title_t&q=dog&defType=synonym_edismax&synonyms=true&debugQuery=true&q.op=AND&synonyms.constructPhrases=true&synonyms.originalBoost=1.1&synonyms.synonymBoost=0.9
 ==> +((Title_t:dog)^1.1 (((+(Title_t:dog)) (+(Title_t:dog(inc))) (+(Title_t:"man's best friend")))^0.9))

 search query : dog(inc) ==> find dog's synonyms and make the AND search phrase with the dog's synonym and "inc". 
http://127.0.0.1:8983/solr/select?qf=Title_t&q=dog(inc)&defType=synonym_edismax&synonyms=true&debugQuery=true&q.op=AND&synonyms.constructPhrases=true&synonyms.originalBoost=1.1&synonyms.synonymBoost=0.9
 ==> +((((Title_t:dog) (Title_t:inc))~2^1.1) (((+(((Title_t:dog) (Title_t:inc))~2)) (+(((Title_t:"dog inc") (Title_t:inc))~2)) (+(((Title_t:"man's best friend") (Title_t:inc))~2)))^0.9))
OkkeKlein commented 11 years ago

I think this is because of the analyzer. Another example of not wanting the synonyms analyzed like a normal query.

jhsuh commented 11 years ago

But... If I insert the synonyms "dog, man's best friend, dog inc" and search the query "dog inc", this query parser adds unexpected search phrase also just like "dog's synonym AND inc". So I think this is not only because of the analyzer. Thank you for your comment~ ^^

http://127.0.0.1:8983/solr/select?qf=Title_t&q=dog%20inc&defType=synonym_edismax&synonyms=true&debugQuery=true&synonyms.constructPhrases=true&synonyms.originalBoost=1.1&synonyms.synonymBoost=0.9&q.op=AND +((((Title_t:dog) (Title_t:inc))~2^1.1) (((+(((Title_t:"dog inc") (Title_t:inc))~2)) (+(((Title_t:dog) (Title_t:inc))~2)) (+(((Title_t:"man's best friend") (Title_t:inc))~2)) (+(Title_t:"dog inc")) (+(Title_t:dog)) (+(Title_t:"man's best friend")))^0.9))

OkkeKlein commented 11 years ago

Looks to me that second query is parsed differently.

jhsuh commented 11 years ago

@OkkeKlein Yes, but I think this is not needed "dog's synonym AND inc" phrase just like below. "+(((Title_t:"dog inc") (Title_t:inc))~2)) (+(((Title_t:dog) (Title_t:inc))~2)) (+(((Title_t:"man's best friend") (Title_t:inc))~2))".

And I can't always use the WhitespaceTokenizer or KeywordTokenizer for query and synonym.

nolanlawson commented 11 years ago

Sorry, but I'm struggling to understand the issue here. Could you write a unit test to demonstrate what's not functioning here? Just make a branch and modify the examples/example_synonym_file.txt and add a test under test/. Thanks in advance!