xiaoyifang / goldendict-ng

The Next Generation GoldenDict
https://xiaoyifang.github.io/goldendict-ng/
Other
1.61k stars 87 forks source link

Chinese character conversion in full-text search #1382

Open pcdi opened 8 months ago

pcdi commented 8 months ago

Is your feature request related to a problem? Please describe. When using the full-text search, searching in simplified Chinese characters does not find any results in traditional characters and vice versa. As I have some dictionaries in simplified as well as some in traditional characters, I need to perform full-text searches twice, once in simplified and once in traditional characters to make sure I am actually searching all dictionaries.

This could also be extended to include a conversion to Japanese Kanji as well, to include Japanese dictionaries into these full-text searches.

Example: Dictionary A in SC, Dictionary B in TC Full-text search for "词典" will only yield entries from dictionary A, full-text search for "詞典" will only yield entries from dictionary B.

Use case: You have several personal name dictionaries, in SC, TC, and Japanese. You run across a courtesy name (表字), but the headwords are generally the personal name (本名). If you do a headword search for 徐贻孙/徐貽孫, you will not find this person, as it will be sorted as 徐维则/徐維則. If you perform a full-text search for the courtesy name Yisun, then you need to do it twice (or even three times, in case the Japanese Kanji are different from both SC and TC) to find all entries that contain 贻孙 or 貽孫.

Describe the solution you'd like GD already has conversion methods included when using regular search: SC to TC (TW), SC to TC (HK), and TC to SC. It would be helpful if these conversions could also be selectively applied when performing full-text search.

Describe alternatives you've considered Alternatively, conversions could be always enabled in full-text search by default. However, I think it might help to narrow searches down if it would be still possible to search without conversion. So, in my opinion, being able to turn conversion on or off per search would be most sensible.

Additional context This problem does not affect full-text searches where the simplified characters are the same as the traditional ones, eg full-text search for "人".

xiaoyifang commented 8 months ago

image

enable this

pcdi commented 8 months ago

I have already enabled these options. However, the conversion currently only works for the regular headword search in the GD main window, not for the full-text search (全文搜索) that can be opened with Cmd-Shift-F. My feature request is that the full-text search also makes use of these conversions.

pcdi commented 8 months ago

As a follow-up to illustrate my point, I provide some screenshots here. Full-text searches in TC or SC yield different results, even though conversion is enabled in the settings, as can be seen below.

Full-text search in traditional characters:

Full-text-search-TC

# Full-text search in simplified characters:

Full-text-search-SC

# Conversion enabled in settings:

Conversion-enabled-in-settings

# Screenshots made on version

Goldendict-ng 23.12.08-alpha.20240124.e3125452
Qt 6.7.0 Clang 14.0.3 (clang-1403.0.22.14.1) macos darwin 23.2.0 arm64-little_endian-lp64
Flags:USE_XAPIAN MAKE_ZIM_SUPPORT MAKE_CHINESE_CONVERSION_SUPPORT NO_TTS_SUPPORT no_ffmpeg_player