Open jvence opened 3 years ago
Here's my take on it:
index_settings = {
"settings": {
"analysis": {
"filter": {
"english_stop": {
"type": "stop",
"stopwords": "_english_"
},
"english_stemmer": {
"type": "stemmer",
"language": "english"
}
},
"analyzer": {
"rebuilt_english": {
"tokenizer": "standard",
"filter": [
"lowercase",
"english_stop",
"english_stemmer"
]
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"content": {
"type": "text",
"analyzer": "rebuilt_english"
}
}
}
}
}
I think it would be a good idea to update data_utils.py to include a Stemming filter by default when creating Elasticsearch indices. This would tremendously improve the results returned by ES.