squidfunk / mkdocs-material

Documentation that simply works
https://squidfunk.github.io/mkdocs-material/
MIT License
19.31k stars 3.4k forks source link

Towards better documentation search #6307

Open squidfunk opened 8 months ago

squidfunk commented 8 months ago

Background

As you may have read in one of my recent comments, we're currently revising our search implementation. The current search is based on Lunr.js, which is also the search engine that MkDocs has been using the time Material for MkDocs started in 2016. In the beginning, we felt that this was a good fit, as Lunr.js allows searching in the browser without the need for an external service. This makes deploying documentation much simpler, since search is and should always be a central component to each and every good documentation site.

In the past years, we've invested hundreds of hours into making search better. With the help of our awesome sponsors, we were able to ship rich search previews, support for more sophisticated tokenizers, support for Chinese, as well as better highlighting. Additionally, we made search almost twice as fast. However, in order to progress, and solve the many open issues that are related to search, we decided to throw out Lunr.js. There are several reasons for that, the most important of which that it is unmaintained since 2020. Additionally, Lunr.js only allows ranking with BM25, which is a good basis, but almost all issues that are related to weird rankings are caused by the fact that BM25 is not ideal for stable typeahead search. It was meant for full-word retrieval and is almost impossible to tame for the many different use cases that we've seen in the wild. Again, we've invested a lot of time to improve the situation, but we've reached an end where this doesn't make sense anymore.

This is the reason why we're currently releasing so few new features, because we're putting our entire energy in finishing the new search implementation. We're already almost en-par with Lunr.js' functionality, but now have an entirely modular architecture, which will allow us to swap out everything. Yes, I mean everything: the ranking algorithm, wildcard matching, the inverted index implementation, yada, yada, yada. Solving the documentation search problem is a personal affair for me. I really hate that there's not yet a solution that works reliably, can run anywhere, and is modular so it can be easily customized.

This is what we're building.

As you may already suspect, this is a pretty big project, which is why it is taking so long. We feel, it is the perfect moment to venture into this problem, because we gathered a lot of use cases that we can now balance and optimize for. However, please understand that this takes time, so I kindly ask you to be a little more patient. Development on this project is after all 99% done by me, @squidfunk, and we're rewriting something that millions of users are using each and every day. That needs care.

Where we're currently at

First of all: search will be a separate, new project! This means you will be able to use the same engine in your other projects as well. Additionally, here's a non-exhaustive list of things we're planning to ship in the first version:

Here's a list of ideas, partially based on open change requests, which we will implement after the first version is out and reached a stable state. We believe all of those features will be great additions:


This list is far from complete. We have so many more ideas, which we'll share when the time has come. We'll keep this issue updated, so feel free to subscribe or check back from time to time. We hope to push our the first candidate before the end of this year! Thank you for your patience and for your trust in Material for MkDocs.

strausmann commented 8 months ago

Great ideas and very great features for searching. The most important thing for us is that the search continued to work completely offline and without a web server. We use MkDocs as documentation, it has to work offline on the plane or on the ship.

squidfunk commented 8 months ago

@strausmann that is our priority. It will definitely work offline (our prototype already does), but we'll also add interesting new features like search federation (merging search indexes with other MkDocs sites) for which you obviously need to be online. All of those are optional and will degrade gracefully when offline, of course.

strausmann commented 8 months ago

The search federation is of course one of the most interesting features for us too. The documentation also runs on a web server. Several mkdocs instances run side by side for different topics. If the search for one mkdocd now also returns the contents of the other instances, that's brilliant. Of course, these instances then run on a web server in a closed environment.

squidfunk commented 8 months ago

Thanks for sharing your setup – that sounds like a perfect test case once we have a prototype. If you like, you can subscribe to #5230 and give it a try once we have the first version out ☺️

strausmann commented 8 months ago

Very happy, we would like to test it. we are excited.

squidfunk commented 8 months ago

1st search preview is ready in #6321 – We encourage you to try it on your project and give feedback in #6321 ☺️

[!NOTE] It's still the same UI/UX, as we're currently focusing on internals. However, this PR fundamentally changes the search results, so we'd be interested to learn if you feel that it works better or worse in your documentation project. We'll be continuing to work on the internals and other parts mentioned in the OP while awaiting your feedback ☺️

squidfunk commented 8 months ago

2nd search preview is ready in https://github.com/squidfunk/mkdocs-material/pull/6372 – We encourage you to try it on your project and give feedback in https://github.com/squidfunk/mkdocs-material/pull/6372 ☺️

[!NOTE]

It's still the same UI/UX, as we're currently focusing on internals. However, this PR fundamentally changes the search results, so we'd be interested to learn if you feel that it works better or worse in your documentation project. We'll be continuing to work on the internals and other parts mentioned in the OP while awaiting your feedback ☺️

AutonomousCat commented 7 months ago

I started using a Python library that uses MkDocs and this theme, and the search experience has been a bit overly stressful compared to what I'm used to with Sphinx, so I'm glad to see a search overhaul is already started.

My number 1 feature request would be a dedicated search page, and an option for the search bar to take you to it. I feel like the small window approach is not possible to fit all projects, for example API wrappers, where the results are large, but rightly so. It simply takes too much effort to go through all the "<#> more on this page" and scrolling through, only to possibly pass what you're looking for multiple times because of the small area.

That's really my main issue with search.

ctalr-jb commented 7 months ago

I'm loving the direction of this new search implementation. With the previews so far, I'm seeing a massive improvement in performance on larger sets of docs. Aside from the return of previous features, I'm definitely interested in seeing the "document metadata" and "federated search" ideas come to fruition for my own use cases.

Aruelius commented 7 months ago

I have been using mkdocs-material for over a year, it's good, but the support for Chinese search is not perfect. Both jieba and Lunr.js have very limited support for Chinese, and I know you have also been working on improving Chinese search, thank you very much!

In fact, when I was preparing to write the 2.0 document version of my project, I think that I need to use some new framework, such as Nextra, VitePress, dumi etc., cause that framework natively support Chinese search.

But I just saw this issue, I was excited, I got hope and I'll wait for better search to be released.

Merry Christmas and Happy New Year!

squidfunk commented 7 months ago

@Aruelius could you share some links to SSGs that support better Chinese search than Material for MkDocs? We're very interested in improving support, and checking some existing solutions is always a good idea. As we're writing everything from scratch, now is the best time to investigate. Please don't only share links to the SSGs, but to resources that explain how search works in those SSGs, i.e., documentation pages, blog posts, repositories. Thank you!

Also please understand that "Chinese search is not perfect" is very hard for me to turn into actionable items. I don't speak Chinese. I'm essentially trying to improve search for a language I don't understand. I will need support from Chinese speaking users. Let's create a better search experience together.

squidfunk commented 7 months ago

FWIW, a quick search surfaced that Vitepress supports two search providers:

We're aiming to build one of the most powerful search solutions that are Open Source (not like Algolia) and can run in the browser or on the Edge, but in-browser search just cannot compete with a hosted solution. That being said, if you would like to work together on this, I'd be happy to know exactly what you expect from Chinese search, what doesn't work correctly, and maybe if you found any Open Source solutions for this problem, because IMHO, to-date, Material for MkDocs is one of the very, very few SSGs that support Chinese search at all without a third-party service.

Nonetheless, we need to improve it!

Aruelius commented 7 months ago

@Aruelius could you share some links to SSGs that support better Chinese search than Material for MkDocs? We're very interested in improving support, and checking some existing solutions is always a good idea. As we're writing everything from scratch, now is the best time to investigate. Please don't only share links to the SSGs, but to resources that explain how search works in those SSGs, i.e., documentation pages, blog posts, repositories. Thank you!

Also please understand that "Chinese search is not perfect" is very hard for me to turn into actionable items. I don't speak Chinese. I'm essentially trying to improve search for a language I don't understand. I will need support from Chinese speaking users. Let's create a better search experience together.

I'm happy to contribute everything I can do.

This is flexsearch(https://github.com/nextapps-de/flexsearch) which is what Nextra is using, maybe it can help you.

image

squidfunk commented 7 months ago

... and FlexSearch is better? Could you please share concrete examples (ideally with minimal reproductions) that allow me to thoroughly understand what you find to work better in FlexSearch than in in our implementation? Also note that FlexSearch is unmaintained since 2 years.

Aruelius commented 7 months ago

Currently, all jieba can do is just text segmentation, and we can only search for segmented phrases, this is my understanding. But for me, what I want to search for is any word.

For example, a text like:

支持中文搜索​

jieba will split this text for two words:

["支持", "中文搜索"]

So, when I search 搜索 (this is a Chinese word that means search), it return nothing, which is happening on this page: https://squidfunk.github.io/mkdocs-material/blog/2022/05/05/chinese-search-support/, but word 中文搜索 can be searched.

It can be resolved by set cut_all to True, it will return:

["支持", "中文", "中文搜索", "搜索"]

However, if I want to search 文搜, in Chinese this is not a word, so it will not be segmented by jieba, and no results will be return.

Here is a site: https://d.umijs.org/guide, made by dumi, you can try search for any Chinese word in the site and it will return the correct results.

Aruelius commented 7 months ago

... and FlexSearch is better? Could you please share concrete examples (ideally with minimal reproductions) that allow me to thoroughly understand what you find to work better in FlexSearch than in in our implementation? Also note that FlexSearch is unmaintained since 2 years.

nope, I don't think flexsearch is better, but I built a documentation website using Nextra and the search module works fine.

squidfunk commented 7 months ago

Thanks for the clarification! Would it help if you could do infix search? In essence, if you could not only search for prefixes, but for any character contained in 支持中文搜索​? Or are there cases where this doe not make sense?

Aruelius commented 7 months ago

Yes, if the keyword in the text, it need be searched. For English, I think nobody will search for rch to get the result of search, but it will happen in Chinese. We don't need to input a complete word for the correct results.

More like Chrome's search.

image

squidfunk commented 7 months ago

Thanks for sharing! It's safe to say that we will account for this use case as well ☺️

Aruelius commented 7 months ago

Thanks for sharing! It's safe to say that we will account for this use case as well ☺️

Thank you~

do-me commented 6 months ago

Just linking https://github.com/squidfunk/mkdocs-material/discussions/5483 for some ideas how to implement semantic search without the need for a vector DB and model server, if loading 10-50Mb of resources is not a problem. If it is, stick to the "proper" setup with respective powerful infrastructure. 
I'm thinking of creating an mkdocs plugin but could use some helping hands in case (comment on linked discussion) :)

squidfunk commented 6 months ago

Thanks! Definitely interesting, but likely not possible in the browser alongside documentation that is shipped to users. 38MB download (as mentioned in the linked issue) is a no-no, but we have alternative ideas to explore ☺️

I'm thinking of creating an mkdocs plugin but could use some helping hands in case (comment on linked discussion) :)

If you want to go ahead, sure! We'll investigate this topic next year. Unfortunately, I have too much to do right now to help you, but once we tackle this, I'll post here, so everybody who is subscribed will be notified.

do-me commented 6 months ago

Agree, both variants ( client-only vs server-client) have their tradeoffs (size/speed vs. cost/overhead). With the current hype, the recent hardware & software developments I could well imagine something like on-device inference-server (with default pre-trained models) that could easily be hooked up to the browser or apps system-wise. Once such a system is in reach, we could reevaluate maybe if only the index file would be downloaded (similar to the normal lunr search atm).

we have alternative ideas to explore ☺️

Excited for any kind of development here! :)

syeda-git commented 6 months ago

@squidfunk is there a tentative date range that this new search feature would be available?

squidfunk commented 6 months ago

Please be assured that we are working hard making the new search available as fast as possible, but it is a pretty big fish to fry – I'm essentially writing a search engine from scratch. You can support us finishing it faster by sponsoring the project, because with more sponsorships, I can delegate more work to other individuals helping out on issues, discussions, questions, etc., and focus on pushing it forward.

Sadly, only a small fraction of companies that uses Material for MkDocs and actually makes or saves money of our work supports our work financially. A lot of companies only free-ride. This makes our work more tedious.

lucaong commented 6 months ago

Local search, implemented using minisearch, which does not support Chinese

@squidfunk MiniSearch does support Chinese, although one has to provide a custom tokenizer for it, as explained for example here: https://github.com/lucaong/minisearch/issues/201#issuecomment-1890921800

squidfunk commented 6 months ago

Thanks! Note that Intl.Segmenter is not supported in all browsers. Also, according to https://github.com/squidfunk/mkdocs-material/issues/6307#issuecomment-1865922955, segmenting is not enough to provide a good experience. Infix search seems to be necessary, but we need to investigate.

lucaong commented 6 months ago

Understood. I am not personally knowledgeable about supporting full-text search on Chinese language, but I tried to make MiniSearch as configurable as possible. I would definitely be interested in understanding if there is any gap there that cannot be solved with configuration, as well as a working MiniSearch configuration for Chinese to suggest to users.

Regarding infix search, that one is in fact a common request from users needing to support Chinese, and it can be done with MiniSearch (although the index will necessarily get larger). Here is a commend explaining how.

squidfunk commented 6 months ago

Yeah, I'm having my troubles understanding Chinese as well 😅 Thanks for explaining how to implement infix search with MiniSearch. However, as you can see from the OP, we're actively working on a new search engine. The reason is that Chinese search is not the only thing we need to support, but we need a solution that is as modular and flexible as possible, and with 65 supported languages and more than 40k installations, we have a lot of use cases to cater to. We've something close to being in prototype stage, so the decision whether to use an existing solution like MiniSearch is already a done deal. Thank you for your understanding.

Lexachoc commented 5 months ago
I am new to material for MkDocs. The built-in search is good to use until I have a Markdown page with symbols and latex equations, as below: Symbol Description
Ain absorbance
$A_{in}=-\ln[I/I0]=-\ln\tau{in}$
$A_{in}$ absorbance
$A_{in}=-\ln[I/I0]=-\ln\tau{in}$

I can only search for the symbols in the browser using the Ctrl+F function to the first row but not the second row with Ain. But both rows cannot be searched by entering Ain. That's not intuitive for me.

So it would be very useful if the search bar had the ability to search for the sub (sup) string, like the built-in Ctrl+F in the browser, or even better, to search for Latex

I would expect to enter Ain and get the result of the preview with rendered symbols (equations) instead of the Latex syntax.

NFanoe commented 2 months ago

Have you considered some kind of faceted search? When we search for something, we get a ton of API stuff first. It would be great to be able to filter that away, or filter it in, based on maybe a metadata tag or even just a path.

squidfunk commented 2 months ago

Yes, filters (facetted search) will definitely be supported ☺️

NFanoe commented 1 week ago

I missed both previews. Any news of a new preview date (or a release)? 🥇

squidfunk commented 1 week ago

Yes, this year. Sorry for the silence – we're working very hard on another huge topic right now that has to predate the new search functionality, and we'll be resuming adding the finishing touches immediately after that. There will be a huge announcement later this year. We'll announce this here as well 🤟