Open shenlebantongying opened 2 months ago
as someone who has made of those tokenizers for goldendict, it'd break the logic of the tokenizers if you change this. you could add a list of suggested tokenizers in the goldendict-ng website instead like gd-tools from tatsumoto and Hakurei
There are various issues reported wanting to transform a word in a specific way.
https://github.com/xiaoyifang/goldendict-ng/issues/1350 https://github.com/xiaoyifang/goldendict-ng/issues/1466 https://github.com/xiaoyifang/goldendict-ng/issues/1478 (https://github.com/xiaoyifang/goldendict-ng/issues/1478#issuecomment-2089871152) https://github.com/Ajatt-Tools/gd-tools?tab=readme-ov-file#gd-mecab
GD has "Transliteration" for some languages, but we cannot expand to every language.
Maybe we can add a new type of "Program" of dictionary.
When querying a word or sentence, it will be sent to the program first, then adding the results to the search candidate list, similar to the existing "Transliteration".
Example: Search
arisen
and the program will returnarise
, then the result page will show botharisen
andarise
.GD already has Program type -> "Prefix match", but it only adds new entries to the search box's completion list.