openvanilla / McBopomofo

小麥注音輸入法
http://mcbopomofo.openvanilla.org/
MIT License
605 stars 76 forks source link

[詞庫問題回報] #416

Open mjhsieh opened 6 months ago

mjhsieh commented 6 months ago

分類

說明

tianjianjiang commented 6 months ago

也許短期目標是找些奇妙的詞典,像是偽基之類的。

xatier commented 1 week ago

I have this locally to manage multiple user-defined dictionaries, where dict1, dict2, dict3 are coming from different sources/themes.

#!/usr/bin/env bash

set -euo pipefail

OUT="data.txt"

sources=(symbol.txt emoji.txt dict1.txt dict2.txt dict3.txt)

echo '# user phrases file' >"$OUT"
echo "" >>"$OUT"
for f in "${sources[@]}"; do
    cat "$f" >>"$OUT"
    echo "" >>"$OUT"
done
xatier commented 1 week ago

I run jieba.analyse.textrank and jieba.analyse.extract_tags (TF-IDF) to extract phrases from web pages and extend my dictionaries above with some semi-manual scripts.

https://github.com/fxsjy/jieba?tab=readme-ov-file#%E5%9F%BA%E4%BA%8E-tf-idf-%E7%AE%97%E6%B3%95%E7%9A%84%E5%85%B3%E9%94%AE%E8%AF%8D%E6%8A%BD%E5%8F%96