Open jbms opened 1 year ago
Re: search backend -- Has anyone looked into using lunr.py to actually build a Lunr index? https://github.com/yeraydiazdiaz/lunr.py
The thing that I don't especially like about lunr is that the "search index" that must be downloaded by clients contains the entire text content of the website, which I think would be problematic for large sites. In contrast, the Sphinx search index contains only:
I expect that to be significantly smaller, though I haven't done any benchmarks.
Well, I just tried to merge updates from upstream, but
Needless to say, I failed to merge updates from upstream. It would be embarrassing and unproductive to push my local attempt into a branch.
With the lack of regular attention, this project has become a mess for merging updates from upstream. I can do maintenance, but
I think I could take care of the merging (especially for search) but do you think you could take care of implementing the new features that require separate sphinx/python integration, like page icons, etc.?
The new features being implemented wouldn't need to block the merge but it would be nice not to lose track of them.
Python is my abode. You need something done in python (or CSS)? I can (& will) help out there. Its just the JS part of this project that I can't do on my own. And there is a bunch of tweaking to the build script upstream.
The new features being implemented wouldn't need to block the merge but it would be nice not to lose track of them.
My thinking exactly. My instinct tells me to start a repo "project" (kinda like github scrum) to group issues that would track the new features. I guess, since I can't control that here, we could just use issue labels instead though.
mkdocs-material v9 has been released with the "rich search" functionality, which basically amounts to including a limited set of HTML tags in the search snippets, rather than using just the text content. Since we have a separate search backend we can't just get this functionality for free on the backend side, but it would be interesting to look at how easily we can implement it.
Currently, to extract snippets, the client-side javascript code downloads the full HTML of each candidate result page, splits it into sections, and extracts the text to find matches of the search terms. We could probably modify this to retain some HTML tags when generating snippets, in order to provide a similar "rich search" display as in mkdocs-material.
Alternatively, we could preprocess each document at build time and output a stripped version of the document in jsonp format. That might make the client-side work more efficient (unclear whether it would significantly reduce the amount of data that must be fetched per candidate result).