Open mjhsieh opened 6 months ago
也許短期目標是找些奇妙的詞典,像是偽基之類的。
I have this locally to manage multiple user-defined dictionaries, where dict1
, dict2
, dict3
are coming from different sources/themes.
#!/usr/bin/env bash
set -euo pipefail
OUT="data.txt"
sources=(symbol.txt emoji.txt dict1.txt dict2.txt dict3.txt)
echo '# user phrases file' >"$OUT"
echo "" >>"$OUT"
for f in "${sources[@]}"; do
cat "$f" >>"$OUT"
echo "" >>"$OUT"
done
I run jieba.analyse.textrank
and jieba.analyse.extract_tags
(TF-IDF) to extract phrases from web pages and extend my dictionaries above with some semi-manual scripts.
分類
說明