xiaoyifang / goldendict-ng

The Next Generation GoldenDict
https://xiaoyifang.github.io/goldendict-ng/
Other
1.67k stars 92 forks source link

Questions about full text search #864

Open popyoung opened 1 year ago

popyoung commented 1 year ago

今天试了一下全文搜索,我搜索的文字是"take the road",不含引号。 出来的结果和这个都没关联,我不清楚是自己的用法不对(可能就不能搜索带空格的词组?)还是它确实有问题,反正和我预期的结果差太远了。速度倒是确实很快……

使用的是最新的带有xapian的版本。

github-actions[bot] commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

I tried a full-text search today, and the text I searched for was "take the road" without quotation marks. The result has nothing to do with this. I don't know if my usage is wrong (maybe you can't search for phrases with spaces?) or it does have a problem, anyway, it is far from the result I expected. The speed is really fast...

xiaoyifang commented 1 year ago

相当于google搜索的效果。 不带引号的话,命中任何一个词都算命中。

github-actions[bot] commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

Equivalent to the effect of google search. Without quotation marks, any word hit counts as a hit.

itkind commented 1 year ago

In the Mac version there is no option for choosing the word distance. It was there before, what happened?

xiaoyifang commented 1 year ago

removed. The fulltext use xapian as engine now. Search engines usually do not have this . The use cases should be few.

popyoung commented 1 year ago

我也试了加双引号和单引号的搜索,结果更奇怪了……

github-actions[bot] commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

I also tried searching with double quotes and single quotes, and the result was even weirder...

xiaoyifang commented 1 year ago

加了双引号,搜索的内容中,应该包含结果中的所有单词, 单词的顺序不保证。(保证单词的顺序的话,需要启用position的特性,索引文件会增大100%)。

你给个具体的例子

github-actions[bot] commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

Double quotation marks are added, the search content should contain all the words in the result, and the order of the words is not guaranteed. (To ensure the order of words, you need to enable the position feature, and the index file will increase by 100%).

give me a specific example

popyoung commented 1 year ago

乱序的话大概能解释通了,毕竟take the road三个字都算常见,在哪里出现都不奇怪。我看了几个结果,都能在文章里分别找到这三个词。但这样一来即使加了双引号,词组的全文索引可用性也很不理想。

github-actions[bot] commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

The out-of-order words can probably explain it. After all, the three words take the road are considered common, and it is not surprising that they appear anywhere. I read a few results, and I could find these three words in the article. But even if double quotation marks are added in this way, the availability of the full-text index of the phrase is not ideal.

xiaoyifang commented 1 year ago

乱序的话大概能解释通了,毕竟take the road三个字都算常见,在哪里出现都不奇怪。我看了几个结果,都能在文章里分别找到这三个词。但这样一来即使加了双引号,词组的全文索引可用性也很不理想。

明白,后续看下是否开启 position 的特性。 开启后,就会完全按照顺序搜索。 不过这个会对索引文件的体积影响比较大。大概增加100%。 或者提供选项给用户自己选择。

github-actions[bot] commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

The words out of order can probably explain it. After all, the three words take the road are considered common, and it is not surprising that they appear anywhere. I read a few results, and I could find these three words in the article. But even if double quotation marks are added in this way, the availability of the full-text index of the phrase is not ideal.

I understand, let’s see if the feature of position is enabled later. When turned on, the search will be completely in order. However, this will have a relatively large impact on the size of the index file. Probably an increase of 100%. Or provide options for users to choose.

shenlebantongying commented 1 year ago

This page need some updates https://xiaoyifang.github.io/goldendict-ng/ui_fulltextsearch/

popyoung commented 1 year ago

是的,我看索引文件确实不小。可以默认打开,在乎磁盘空间的用户可以关闭。

github-actions[bot] commented 1 year ago

Bot detected the issue body's language is not English, translate it automatically.

Yes, I think the index file is indeed not small. It can be turned on by default, and users who care about disk space can turn it off.

xiaoyifang commented 1 year ago

This page need some updates https://xiaoyifang.github.io/goldendict-ng/ui_fulltextsearch/

https://github.com/xiaoyifang/goldendict-ng/pull/869

shenlebantongying commented 1 year ago

Maybe we should rename Whole word to Default or just Xapian.

If we enables the positional index, the feature wanted by @itkind appears to be jesus NEAR/10 christ (distance between two words is 10) or jesus ADJ/10 christ (same to NEAR but the order is required)?

xiaoyifang commented 1 year ago

The Near feature required position information which has been disabled right now.

itkind commented 1 year ago

@xiaoyifang thank your implementing this feature. Can you also implement the feature that the searched phrase is marked in the main window. E.g. I search for cat NEAR milk and get some results but then I have to search in the main window again only this time the NEAR parameter doesn't do here anything, so there will be a lot of reading involved to find the phrase near to each other.

xiaoyifang commented 1 year ago

NEAR parameter doesn't do here anything, so there will be a lot of reading involved to find the phrase near to each other.

highlight in the result page is actually the same effect as Ctrl+F . when search in the page(ctrl+F) , NEAR as a word can not be matched.

Implement this wholely may involves too much work.

Maybe for this case , only highlight the first word , which in this case ,cat

itkind commented 1 year ago

NEAR parameter doesn't do here anything, so there will be a lot of reading involved to find the phrase near to each other.

highlight in the result page is actually the same effect as Ctrl+F . when search in the page(ctrl+F) , NEAR as a word can not be matched.

Implement this wholely may involves too much work.

Maybe for this case , only highlight the first word , which in this case ,cat

Are you referring to the full text search? So there is no way that the results of the full text search can be displayed?

xiaoyifang commented 1 year ago

So there is no way that the results of the full text search can be displayed?

results can be displayed ,but the search items are difficult to highlight in the result page.

itkind commented 1 year ago

So there is no way that the results of the full text search can be displayed?

results can be displayed ,but the search items are difficult to highlight in the result page.

I see. In that case I will use the latest ARM mac version since there it worked well. Hope ARM support will return again.

xiaoyifang commented 1 year ago

the arm version does not support highlight the syntax as a NEAR b neither.

itkind commented 1 year ago

the arm version does not support highlight the syntax as a NEAR b neither.

Image on 2023-07-19 03 32 56 PM