+OPTIONS: H:3 num:2

目录使用 toc-org 生成, 每次保存自动更新

Emacs相关中文问题以及解决方案

本文尝试收集整理 Emacs 使用过程中的中文相关问题以及解决方案. 欢迎 Emacs 爱好者加 hick 微信 3715397 邀请进只聊 Emacs 话题的微信大群.

changelog :

2019-02-01 增加中文相关网站和聚集地 emacs-china
2018-02-25 Writing GNU Emacs Extensions 中文翻译
2016-09-09 增加清华和 Emacs-China 的 elpa 镜像
2016-01-01 安装包镜像/安装源中国大陆服务器
2015-12-27 文件编码问题整理

** 目录 :TOC_3_gh:

[[#emacs相关中文问题以及解决方案][Emacs相关中文问题以及解决方案]]
- [[#emacs-中文基础][Emacs 中文基础]]
  - [[#emacs-的各种字符集][Emacs 的各种字符集]]
  - [[#ms-windows-环境的-utf-8-配置][MS-Windows 环境的 UTF-8 配置]]
  - [[#中文断行][中文断行]]
  - [[#中文字体设置][中文字体设置]]
  - [[#日历设置][日历设置]]
- [[#中文输入][中文输入]]
  - [[#外部中文输入法][外部中文输入法]]
  - [[#内置的输入法][内置的输入法]]
  - [[#输入法相关-mode][输入法相关 mode]]
- [[#org-的中文问题][org 的中文问题]]
  - [[#table-中英文对齐等][table 中英文对齐等]]
  - [[#org-导出中文-html-的空格问题][org 导出中文 html 的空格问题]]
  - [[#org-以及-latex-等导出中文-pdf][org 以及 LaTex 等导出中文 pdf]]
- [[#中文用户聚集地][中文用户聚集地]]
  - [[#emacs-爱好者微信群][Emacs 爱好者微信群]]
  - [[#emacs-china][Emacs-China]]
  - [[#水木社区-emacs-版][水木社区 Emacs 版]]
  - [[#telegram-emacs_zh-][telegram emacs_zh ***]]
- [[#其他中文相关工具][其他中文相关工具]]
  - [[#writing-gnu-emacs-extensions-中文翻译][Writing GNU Emacs Extensions 中文翻译]]
  - [[#安装包镜像安装源中国大陆服务器][安装包镜像/安装源中国大陆服务器]]
  - [[#ace-pinyin][ace-pinyin]]
  - [[#cedict][cedict]]
  - [[#chinese-word-at-point][chinese-word-at-point]]
  - [[#chinese-fonts-setup][chinese-fonts-setup]]
  - [[#chinese-pyim-bigdict][chinese-pyim-bigdict]]
  - [[#chinese-in-spacemacs][Chinese in Spacemacs]]
  - [[#emacs-hz2py][emacs-hz2py]]
  - [[#find-by-pinyin-dired][find-by-pinyin-dired]]
  - [[#ido-pinyin][ido-pinyin]]
  - [[#magit-的中文问题][magit 的中文问题]]
  - [[#pangu-spacing][pangu-spacing]]
  - [[#pinyin-search][pinyin-search]]
  - [[#pyim-hanzi2pinyin][pyim-hanzi2pinyin]]
  - [[#tael-同音字处理][ta.el 同音字处理]]
  - [[#中文分词工具][中文分词工具]]
  - [[#欢迎补充][欢迎补充]]
- [[#参考和鸣谢][参考和鸣谢]]

** Emacs 中文基础

*** Emacs 的各种字符集

Emacs 中的文件编码问题跟其他编辑器不大一样, ~M-x describe-current-coding-system~ 能查看 Emacs 当前编码设置的情况, ~M-x describe-coding-system~ 能看到所有可选择编码, 除了常见的 utf-8 这样的, dos/unix 这样的换行符也作为了编码的一部分. 作为比较通用的方式, 一般建议没有特殊需要都使用 utf-8 编码, unix 还是 dos 看个人需要.

默认配置下, mode line 最左会显示当前文件编码以及换行符格式, 比如 U(Unix) 标示 UTF-8 编码, UNIX 格式换行符,鼠标点击可以查看更多详情.

[[http://www.emacswiki.org/emacs/Unicad][Unicad]] 是一个自动检测文件编码的 mode , 有看到反馈说有时候不能成功检测: Unicad is short for Universal Charset Auto Detector. It is an Emacs-Lisp port of Mozilla Universal Charset Detector.

不同版本的 Emacs 可能设置方法不一样, 推荐下面的设置:

+BEGIN_SRC Emacs lisp

(set-language-environment "UTF-8") (set-default-coding-systems 'utf-8) (set-buffer-file-coding-system 'utf-8-unix) (set-clipboard-coding-system 'utf-8-unix) (set-file-name-coding-system 'utf-8-unix) (set-keyboard-coding-system 'utf-8-unix) (set-next-selection-coding-system 'utf-8-unix) (set-selection-coding-system 'utf-8-unix) (set-terminal-coding-system 'utf-8-unix) (setq locale-coding-system 'utf-8) (prefer-coding-system 'utf-8)

+END_SRC

如果一个非 UTF-8 编码, 比如 GBK 编码的文件打开, 可能 Emacs 会乱码, 这时候 ~M-x revert-buffer-with-coding-system~ 选择 ~gbk~ 即可.

*** MS-Windows 环境的 UTF-8 配置

UTF-8 编码在 MS-Windows 环境下是「二等公民」, 特别是与命令行工具进行交互作时, 如果设置了使用 UTF-8 编码会导致 Windows 自带程序输出乱码. Emacs的默认设置已经尝试对此类情况进行了处理, 下面的代码在默认设置的基础上设置 cmdproxy.exe 使用GBK编码, 进一步减少设置 UTF-8 编码的副作用:

+BEGIN_SRC Emacs lisp

(when (eq system-type 'windows-nt) (set-default 'process-coding-system-alist '(("[pP][lL][iI][nN][kK]" gbk-dos . gbk-dos) ("[cC][mM][dD][pP][rR][oO][xX][yY]" gbk-dos . gbk-dos))))

+END_SRC

参考 [[https://github.com/hick/emacs-chinese/issues/4][issue#4]] jacobsun 的分享, 有可能需要如下设置, 否则可能出现乱码, 暂时未获知具体版本等相关信息, hick 个人的 windows 下设置的只有下面说的一行:

+BEGIN_SRC Emacs lisp

;; jacobsun 分享 (when (eq system-type 'windows-nt) (set-next-selection-coding-system 'utf-16-le) (set-selection-coding-system 'utf-16-le) (set-clipboard-coding-system 'utf-16-le)) ;; hick 个人的设置: (set-default-coding-systems 'utf-8)

+END_SRC

*** 中文断行

如果开启了 auto-fill 或者使用 refill-mode 等, Emacs 会根据 fill-column 的大小,在合适的位置插入换行符. 这时候再导出成 html 等格式, 可能导致被自动断行的中文之间多一个空格(中文和英文以及英文与英文之间的空格还是自然的), 一种建议是关掉或者不是用自动断行, 采用 truncate-lines 的方式让 Emacs 自动根据 window 大小处理成显示层的断行.

另外一种比较常见的做法是在导出的时候, 通过导出的过滤器机制, 导出之前去掉中文之间的换行的模式, 具体参考有 [[http://zwz.github.io/][zwz]] 和 [[http://emacs-china.org/blog/2015/04/20/org-mode-%E5%AF%BC%E5%87%BA-html-%E6%97%B6%E5%88%A0%E9%99%A4%E4%B8%AD%E6%96%87%E4%B8%8E%E4%B8%AD%E6%96%87%E4%B9%8B%E9%97%B4%E5%A4%9A%E4%BD%99%E7%9A%84%E7%A9%BA%E6%A0%BC/][tumashu]] 两种类似的实现:

+BEGIN_SRC Emacs lisp

;;; 下面一段是 Feng Shu 的 (defun eh-org-clean-space (text backend info) "在export为HTML时,删除中文之间不必要的空格" (when (org-export-derived-backend-p backend 'html) (let ((regexp "[[:multibyte:]]") (string text)) ;; org默认将一个换行符转换为空格,但中文不需要这个空格,删除. (setq string (replace-regexp-in-string (format "\(%s\) \n \(%s\)" regexp regexp) "\1\2" string)) ;; 删除粗体之前的空格 (setq string (replace-regexp-in-string (format "\(%s\) +\(<\)" regexp) "\1\2" string)) ;; 删除粗体之后的空格 (setq string (replace-regexp-in-string (format "\(>\) +\(%s\)" regexp) "\1\2" string)) string))) (add-to-list 'org-export-filter-paragraph-functions 'eh-org-clean-space)

;;; 下面一段是 zwz 的, 作者声明只适应 org-mode 8.0 以及以上版本 (defun clear-single-linebreak-in-cjk-string (string) "clear single line-break between cjk characters that is usually soft line-breaks" (let* ((regexp "\([\u4E00-\u9FA5]\)\n\([\u4E00-\u9FA5]\)") (start (string-match regexp string))) (while start (setq string (replace-match "\1\2" nil nil string) start (string-match regexp string start)))) string)

(defun ox-html-clear-single-linebreak-for-cjk (string backend info) (when (org-export-derived-backend-p backend 'html) (clear-single-linebreak-in-cjk-string string)))

(add-to-list 'org-export-filter-final-output-functions 'ox-html-clear-single-linebreak-for-cjk)

+END_SRC

*** 中文字体设置

为了保证显示效果, 一般使用中英文等宽字体(一个中文字显示宽度等于俩个英文字母显示宽度), 推荐字体:

文泉驿等宽微米黑优化版.ttf
雅黑mono.ttf
DroidSansFallback.ttf

以上字体都包含中文字体, 可以点 [[http://pan.baidu.com/s/1dDWUSNn][百度网盘]] 下载.

*** 日历设置

Emacs 中有日历, 而且可以称之为一个系统, 因为其中除了最常用的日历之外, 还有其他的近十种历法, 其中有日记、约会提醒、纪念日提示以及节假日提示等等. 其中的历法包括中国的农历、希伯来历、伊斯兰历、法国革命历、中美玛雅历等等,可以根据经纬度告知你的所在的每天日出日落的时间等等.

Emacs 自带 calc-china.el , 以下为设置中文里的 ‘celestial-stem’ (天干) 和 ‘terrestrial-branch’ (地支):

+BEGIN_SRC Emacs lisp

(setq chinese-calendar-celestial-stem ["甲" "乙" "丙" "丁" "戊" "己" "庚" "辛" "壬" "癸"] chinese-calendar-terrestrial-branch ["子" "丑" "寅" "卯" "辰" "巳" "午" "未" "申" "酉" "戌" "亥"])

+END_SRC

设置阳历节日和阴历节日(参考 [[http://www.linuxsir.org/bbs/thread232256.html][fog_proxy]] ):

+BEGIN_SRC Emacs lisp

;;; 补充用法: holiday-float m w n 浮动阳历节日, m 月的第 n 个星期 w%7 (setq general-holidays '((holiday-fixed 1 1 "元旦") (holiday-fixed 2 14 "情人节") (holiday-fixed 4 1 "愚人节") (holiday-fixed 12 25 "圣诞节") (holiday-fixed 10 1 "国庆节") (holiday-float 5 0 2 "母亲节") ;5月的第二个星期天 (holiday-float 6 0 3 "父亲节") )) (setq local-holidays '((holiday-chinese 1 15 "元宵节 (正月十五)") (holiday-chinese 5 5 "端午节 (五月初五)") (holiday-chinese 9 9 "重阳节 (九月初九)") (holiday-chinese 8 15 "中秋节 (八月十五)") ;; 生日 (birthday-fixed 9 28 "爸爸生日(1950)") (birthday-fixed 10 1 "妈妈生日(1953)") (holiday-chinese 5 29 "老婆生日") ;阴历生日

                   (holiday-lunar 1 1 "春节" 0)
                   ))

+END_SRC

另外一种中文阴历节日的 holiday-lunar 的写法参考自: [[http://xlambda.com/blog/2010/01/11/customize-calendar-in-emacs/][在emacs Calendar中定制中国农历节日]]

更强大的中文日历工具:

[[http://www.newsmth.net/bbsanc.php?path=%252Fgroups%252Fcomp.faq%252FEmacs%252Farchives%252Farchive2005%252FM.1121269541.D0][chinese-calendar.el calendar for chinese]]
[[https://github.com/xwl/cal-china-x/blob/master/cal-china-x.el][William Xu 写的中文版日历]]

** 中文输入

*** 外部中文输入法

个人用搜狗中文输入法的还可以

*** 内置的输入法

默认情况下 toggle-input-method 命令切换输入法.

*** 输入法相关 mode

[[https://github.com/danking/eim-py][eim-py: An Emacs Input Method extension for smart pinyin]]
[[https://github.com/gongzhitaao/chinese-wubi][Emacs 中使用五笔输入法: Chinese Wubi (五笔) input method for Emacs based on quail package.]]
[[https://github.com/tumashu/chinese-pyim][chinese-pyim]] chinese-pyim是从eim拼音输入法进化来的, 个人感觉比eim拼音输入法好用
[[https://github.com/cute-jumper/fcitx.el][Make fcitx better in Emacs.]]
[[https://github.com/tumashu/chinese-remote-input][chinese-remote-input]] 在emacs中, 通过智能手机输入法（比如：android语音输入法）远程输入中文.
[[https://github.com/E-Neo/scel2pyim][scel2pyim]] 一个个将搜狗输入法 scel 细胞词库转换为 chinese-pyim 文本词库的小工具.
[[https://github.com/district10/gat][Gat, Chinese Input Method, works in Emacs]]

** org 的中文问题

*** table 中英文对齐等

因为 Emacs 处理字体的方式的问题, 即使设置字体为等宽字体(一个中文相当于两个英文宽度), org 中的 table 出现中文经常都无法工整的对齐. 需要分别对中英文字体设置合适的大小. 处理该问题有现成的方案: https://github.com/tumashu/chinese-fonts-setup . 其中默认定义了各个系统平台常见的字体以及中英文字体搭配, 使得 org table 里的出现中文也能很好的对齐. 如果安装好以后显示的字体过大, 可以通过 ~cfs-increase-fontsize/cfs-decrease-fontsize~ 调整选择合适的大小.

更多参考资料:

[[http://baohaojun.github.io/perfect-emacs-chinese-font.html][狠狠地折腾了一把Emacs中文字体]] BY BAO HAOJUN
[[http://zhuoqiang.me/torture-emacs.html][折腾 Emacs]] BY zhuoqiang

*** org 导出中文 html 的空格问题

严格来说跟 org 没什么关系, 参见上文的 [[#中文断行][中文断行]]

*** org 以及 LaTex 等导出中文 pdf

导出中文也分直接转 LaTeX 再转 pdf 以及先转 html 再转 pdf 等各种方式, 中间方案的可以参考这个 [[http://blog.hickwu.com/posts/340][中文支持不错的pdf工具rst2pdf]]

arthur@微信群分享的 TeX 解决方案, 用 [[http://home.ustc.edu.cn/~zpj/doc/TeX/xetex-tutorial.pdf][XeTeX]] 或者 [[http://www.doc88.com/p-673855969907.html][xetex-tutorial]] .

** 中文用户聚集地

*** Emacs 爱好者微信群

分一个主群和一个闲聊群, 截止 2019-02-01 分别 400 多人和 200 多人.

微信加 3715397 注明 Emacs 可以邀请入群. 特别注意主群约定之聊 Emacs 相关话题, 无关话题转闲聊群.

*** Emacs-China

主要由子龙山人创建的 , [[https://emacs-china.org/][Emacs 讨论社区]] 创建以后比较活跃, 讨论比较有序.

*** 水木社区 Emacs 版

源自水木清华的 telnet BBS 的社区, 目前同时有 [[http://www.newsmth.net/nForum/#!board/Emacs][web 版]] 和 telnet 版

telegram emacs_zh

2020-03 左右上去看过也还比较活跃 [[https://t.me/emacs_zh][telegram emacs_zh]]

** 其他中文相关工具

*** Writing GNU Emacs Extensions 中文翻译

微信群 Emacsist 技术群 [[https://github.com/slegetank][slegetank]] 翻译的 [[https://github.com/slegetank/WGEECN][Writing GNU Emacs Extensions 中文版]]

*** 安装包镜像/安装源中国大陆服务器

由于大陆地区特殊的网络条件, 直连国外的 ELPA 服务器可能特别慢甚至有时候安装不成功, 有以下几个国内镜像推荐大家使用.

"清华大学 TUNA 协会原名清华大学学生网管会，注册名清华大学学生网络与开源软件协会" 搭建的镜像, 因为有清华的网络后盾, 比较推荐这个, 其中也包括 marmalade 和 org 等几个其他 Emacs 包的镜像:

https://mirrors.tuna.tsinghua.edu.cn/elpa/

国内的 Emacs 爱好者搭建的 [[https://emacs-china.org/][Emacs-China]] (作为 Emacs 专属社区也是一个很好的地方) 的镜像也跟上面的类似, 还包括简单的使用方法说明:

http://elpa.emacs-china.org/

[[https://github.com/aborn/][@aborn]] 有搭建的镜像, ELPA 的 EmacsWiki 上也有 [[http://www.emacswiki.org/emacs/ELPA_(%25E4%25B8%25AD%25E6%2596%2587)][相关说明]] :

+BEGIN_SRC Emacs lisp

(add-to-list 'package-archives '("aborn" . "http://elpa.popkit.org/packages/"))

+END_SRC

顺带提一句, 如果 ~M-x package-install~ 出现找不到包的 url , 可能是本地缓存的包地址已经升级变换, ~M-x package-refresh-contents~ 可能就可以了.

*** ace-pinyin

https://github.com/cute-jumper/ace-pinyin

Jump to Chinese characters using ace-jump-char-mode or avy-goto-char : input the first letter of the pinyin of the Chinese character, then use ace-jump-char-mode or avy-goto-char to jump to it.

*** cedict

https://github.com/danmey/cedict.el

Emacs interface to Chinese-English dictionary in CEDICT format.

*** chinese-word-at-point

https://github.com/xuchunyang/chinese-word-at-point.el

Get (most likely) Chinese word under the cursor in Emacs

中文分词跟英文可以时候完全不是一回事, 徐春阳同学弄的这个, 依赖外部分词的命令行: 可以用结巴分词或者 SCWS (简易中文分词系统).

*** chinese-fonts-setup

原地址 https://github.com/tumashu/chinese-fonts-setup 已经更新为 https://github.com/tumashu/cnfonts

emacs中文字体配置工具. 可以快速方便的的实现中文字体和英文字体等宽（也就是常说的中英文对齐）

*** chinese-pyim-bigdict

https://github.com/tumashu/chinese-pyim-bigdict