vuejs / vitepress

Vite & Vue powered static site generator.
https://vitepress.dev
MIT License
13.07k stars 2.11k forks source link

Better default minisearch tokenizer for Chinese documents #4049

Open leyen-me opened 3 months ago

leyen-me commented 3 months ago

Describe the bug

搜索不到内容

Reproduction

中文内容

Expected behavior

image

System Info

no

Additional context

No response

Validations

brc-dd commented 3 months ago

Have you tried doing vitepress build then run vitepress preview?

leyen-me commented 3 months ago

您是否尝试过执行 vitepress 构建然后运行 vitepress 预览?

我尝试构建过,你可以预览我的生产链接,https://web.leyen.me

brc-dd commented 3 months ago

https://github.com/lucaong/minisearch/issues/201#issuecomment-2227591121 -- This comment kind of works, but still needs improvement I guess.

There are also some other people doing this - https://github.com/search?q=vitepress+segmenter+language:JavaScript+OR+language:TypeScript+NOT+is:fork&type=code

Not sure but bm25 parameters might help too - https://github.com/search?q=vitepress+searchOptions+bm25+language:JavaScript+OR+language:TypeScript+NOT+is:fork&type=code (I haven't checked how they work yet.)

brc-dd commented 3 months ago

I'm keeping this open. There should be some defaults here instead of needing Chinese users to manually configure it.

niansi-z commented 3 months ago

I tried it out and found that when there was no title, the problem recurred, not just in Chinese.demo

brc-dd commented 3 months ago

With current logic titles are needed. There should be a h1 per page. There is a PR open to make it more robust in handling such content, will see.