rust-lang / mdBook

Create book from markdown files. Like Gitbook but implemented in Rust
https://rust-lang.github.io/mdBook/
Mozilla Public License 2.0
18.45k stars 1.65k forks source link

Support CJK (mutiple language) search #2052

Open HillLiu opened 1 year ago

HillLiu commented 1 year ago

Problem

Anyone who has a CJK search requirement, There is a quick hack way.

Proposed Solution

Just need add two extra JS.

Add them to book.toml additional-js

Play it with docker directly.

curl https://raw.githubusercontent.com/allfunc/docker-mdbook/main/bin/preview.sh | bash -s -- start

Notes

The trick is to replace Elasticlunr.js with fzf.

CoralPink commented 1 year ago

This is great! It now responds to Japanese!

(Sorry, I used it without permission...๐Ÿ˜“)

wc7086 commented 1 year ago

https://github.com/ajitid/fzf-for-js/issues/112

duskmoon314 commented 10 months ago

I just created a modified fork in this way, and it worked well. The modified version and the modified commit.

It is better to adjust the makeTeaser function according to fzf's score if we want to replace all of Elasticlunr's functions. But I still need to dig into it.

CoralPink commented 10 months ago

This topic is interesting to me as well, and I would like to see it develop!

I have always wanted to know how to reflect search scores.

duskmoon314 commented 10 months ago

I have always wanted to know how to reflect search scores.

Fzf calculates the score according to how the result matches the query and where it occurs. (If my understanding is correct).


I just took a look at makeTeaser, it uses a sliding window to extract the most valuable part which contains the result, showing user where the outcome occurs.

But it depends on two assumptions:

  1. Sentences ended with . (dot + whitespace) src
  2. Words can be split with (whitespace) src

As far as I know, these two assumptions are incorrect in Chinese and Japanese. Though it can be used with Chinese and cause a long context with incorrectly emphasized phrases.

To my understanding, we must figure out how makeTeaser should work in different languages to push this issue further.

CoralPink commented 10 months ago

I see! To be honest, I was using this code without really understanding it๐Ÿ˜…

By the way, this is a bit off topic, but .......

https://rust-lang.github.io/mdBook/format/configuration/renderers.html?highlight=score#outputhtmlsearch

I think this is an elasticlunr option that can be changed by the user through book.toml, but is the score boost applied the same way in fzf?

duskmoon314 commented 10 months ago

I think this is an elasticlunr option that can be changed by the user through book.toml, but is the score boost applied the same way in fzf?

No. There are not many options for fzf's searching function. I think the way to calculate the score is taken from the original Go implementation, and the API doesn't provide a way to control it. API doc

To be honest, I was using this code without really understanding it

My understanding:

function fzfLoad(index) {
    // The argument `index` is generated by crate elasticlunr-rs
    // It contains all pages with their title, breadcrumbs, contents, and some metadata

    // Extract docs from index
    // `docs` is an obj: { [id: number]: { title, breadcrumbs, body, ... } }
    const docs = index.documentStore.docs;

    // Init fzf
    // The first argument is the list to search. (I have tried using Object.entries, but I failed to get it to work)
    // The second argument is the option. `selector` tells fzf what the real content to search.
    const fzf = new Fzf(Object.keys(docs), {
        selector: (id) => {

            // These lines concatenate title, breadcrumbs and body to let fzf search them at once.
            const doc = docs[id];
            return `${doc.title} ${doc.breadcrumbs} ${doc.body}`
        }
    });

    // To be compatible with the original elasticlunr's usage,
    // this obj is needed to provide a `search` method.
    return {

        // `search` takes two arguments, the `term` to search and `options` for elasticlunr
        search: (term, _options) => {

            // We use fzf to search the term
            const entries = fzf.find(term);

            // Then we form the result
            // `doc` and `ref` are used to make the teaser and the link
            const res = entries.map((entry) => {
                const { item, _score } = entry;
                return {
                    doc: docs[item],
                    ref: item,
                }
            });
            return res;
        }
    }
}
CoralPink commented 10 months ago

No. There are not many options for fzf's searching function. I think the way to calculate the score is taken from the original Go implementation, and the API doesn't provide a way to control it. API doc

Ugh, I knew it! If I wanted to use the score boost, I would have to change the order myself ๐Ÿ˜ฑ

... On the other hand, you mean that we might be able to do it if we really wanted to. (I don't know if it's worth the hassle ๐Ÿ˜ฎ)

Thanks for chatting with us! I'm getting a little curious about search logic!

madjxatw commented 9 months ago

It works but with a little flaw that the matched words are not bolded/highlighted.