Open aswad1 opened 5 days ago
[Triage]
Coming from https://github.com/opensearch-project/OpenSearch/issues/16263 and with the proposed fix to add synonym_analyzer
for the synonym_graph
(PR https://github.com/opensearch-project/OpenSearch/pull/16488) should solve this bug as well.
.aff
and .dic
files and put them under config/hunspell/en_US
folder.
Please note there is an issue in the attached .aff
file, the following has to be updated to SFX Z Y 14
as SFX Z
rule has 8 declared in the header but actually contains 14 rules.SFX Z Y 8
SFX Z 0 rs e
SFX Z y iers [^aeiou]y
SFX Z 0 ers [aeiou]y
SFX Z 0 ers [^ey]
SFX Z 0 ners [aiu]n
SFX Z 0 ers [^e]an
SFX Z e ners [aiu]ne
SFX Z 0 rly e
SFX Z y ierly [^aeiou]y
SFX Z 0 erly [aeiou]y
SFX Z 0 erly [^ey]
SFX Z 0 nerly [aiu]n
SFX Z 0 erly [^e]an
SFX Z e nerly [aiu]ne
Once assembled (https://github.com/prudhvigodithi/OpenSearch/tree/bug-fix-synonym) start with ./gradlew run -PnumNodes=1
.
Testing the query with synonym_analyzer
.
curl -X PUT "localhost:9200/test-index5" \
-H "Content-Type: application/json" \
-d '{
"settings": {
"analysis": {
"filter": {
"custom_synonym_graph-replacement_filter": {
"type": "synonym_graph",
"synonyms": [
"stationary, stationery, stationaries, stationeries"
],
"synonym_analyzer": "standard"
},
"custom_hunspell_stemmer": {
"type": "hunspell",
"locale": "en_US"
}
},
"analyzer": {
"test_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"custom_hunspell_stemmer",
"custom_synonym_graph-replacement_filter"
]
}
}
}
}
}'
curl -X POST "localhost:9200/test-index5/_analyze" -H "Content-Type: application/json" -d '{
"analyzer": "test_analyzer",
"text": "stationary"
}'
{
"tokens": [
{
"token": "stationery",
"start_offset": 0,
"end_offset": 10,
"type": "SYNONYM",
"position": 0
},
{
"token": "stationaries",
"start_offset": 0,
"end_offset": 10,
"type": "SYNONYM",
"position": 0
},
{
"token": "stationeries",
"start_offset": 0,
"end_offset": 10,
"type": "SYNONYM",
"position": 0
},
{
"token": "stationary",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
}
]
}
Thank you @msfroh @getsaurabh02 @nupurjaiswal @dblock @aswad1
Describe the bug
When using synonym filter after hunspell. I don't see the expected plural synonyms in the output. In the configuration below, I have added synonyms:
While testing, I don't see stationaries and stationeries in the output.
Here is the details analysis from Opensearch:
The hunspell rules and dictionary files are attached. en-US.aff.txt en-US.dic.txt
Related component
Other
To Reproduce
N/A
Expected behavior
The screen capture for Solr analysis screenshot where the synonym graph filter is highlighted. You will see all the synonyms displayed under SGF
Additional Details
Plugins Please list all plugins currently enabled.
Screenshots If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context Add any other context about the problem here.