Closed marcospassos closed 1 year ago
@marcospassos I've published typesense/docsearch-scraper:0.5.0
which adds support for setting custom token_separators
and symbols_to_index
.
You should now be able to do something like this in the scraper config:
{
"index_name": "typesense_docs",
"start_urls": [
{
"url": "https://typesense.org/docs/(?P<version>.*?)/",
"variables": {
"version": [
"0.21.0"
]
}
}
],
"selectors": {
"default": {
"lvl0": ".content__default h1",
"lvl1": ".content__default h2",
"lvl2": ".content__default h3",
"lvl3": ".content__default h4",
"lvl4": ".content__default h5",
"text": ".content__default p, .content__default ul li, .content__default table tbody tr"
}
},
"custom_settings": {
"token_separators": ["_"], // <=====
"symbols_to_index": ["*"], // <=====
}
}
I've also changed the default token separators to ['_', '-']
.
Could you give it a shot and let me know?
Description
Algolia allows you to pass custom settings through the
custom_settings
options in the docsearch.config.json: https://github.com/algolia/docsearch-configs/blob/master/configs/docusaurus-2.json#L29-L30Actual Behavior
To specify a custom configuration like
token_separators
orsymbols_to_index
, I'd have to fork the scrapper and the GH action to make it work.Expected Behavior
I expected Typesense to provide the same flexibility as Agolia, allowing it to pass any collection configuration.