0.1.x
work with wagtail>=2.0,<2.2
0.2.x
work with wagtail>=2.2
pip install wagtail-whoosh
After installing this package, add wagtail_whoosh
to INSTALLED_APPS. And then config WAGTAILSEARCH_BACKENDS
import os
ROOT_DIR = os.path.abspath(os.path.dirname(__name__))
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_whoosh.backend',
'PATH': os.path.join(ROOT_DIR, 'search_index')
'LANGUAGE': 'fr',
},
}
Set ./manage.py update_index
as cron job
If you want to search hello world
, you might need to use hello
in previous versions. Now you can use hel
and the backend would return the result.
# you need to define the search field in this way
index.SearchField('title', partial_match=True)
# or this way
index.AutocompleteField('title')
# Search just the title field
>>> EventPage.objects.search("Event", fields=["title"])
[<EventPage: Event 1>, <EventPage: Event 2>]
results = Page1.objects.search(query).annotate_score("_score").results()
result += Page2.objects.search(query).annotate_score("_score").results()
return sorted(results, key=lambda r: r._score)
Whoosh includes pure-Python implementations of the Snowball stemmers and stop word lists for various languages adapted from NLTK.
So you can use the built-in language support by setting like 'LANGUAGE': 'fr'
, the language support list is below.
('ar', 'da', 'nl', 'en', 'fi', 'fr', 'de', 'hu', 'it', 'no', 'pt', 'ro', 'ru', 'es', 'sv', 'tr')
If you want more control or want to do customization, you can use ANALYZER
instead of LANGUAGE
here.
An analyzer is a function or callable class (a class with a call method) that takes a unicode string and returns a generator of tokens
You can set ANALYZER
using an object reference or dotted module path.
NOTE: If ANALYZER is set, your LANGUAGE would be ignored
from whoosh.analysis import LanguageAnalyzer
analyzer_swedish = LanguageAnalyzer('sv')
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_whoosh.backend',
'PATH': str(ROOT_DIR('search_index')),
'ANALYZER': analyzer_swedish,
},
}
In most cases, you can modify NGRAM_LENGTH
to make the index
operation faster.
The default minimum length for NGRAM words is 2, and the maximum is 8. For indexes with lots of partial match fields, or languages other than English, this could be too large. It can be customised using the NGRAM_LENGTH
option:
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_whoosh.backend',
'PATH': str(ROOT_DIR('search_index')),
'NGRAM_LENGTH': (2, 4),
},
}
By default the Whoosh indexer uses 1 processor and 128MB of memory max. This can be changed using the PROCS
and MEMORY
options:
Please only change them if you find memory and cpu limits, in some cases, changing them would not speed up the index
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_whoosh.backend',
'PATH': str(ROOT_DIR('search_index')),
'PROCS': 4,
'MEMORY': 2048,
},
}
note: memory is calculated per processor, so the above configuration can use up to 8GB of memory.
facet
is not supported.