Closed RekiDunois closed 2 months ago
我这边无法复现,我本地在 ruyi --version
里加了 locale.getpreferredencoding()
的调试输出,显示是 UTF-8
。我的 LANG
也是 en_US.UTF-8
您可以提供运行环境的更多信息吗?个人有些怀疑是容器之类的 minimal 环境,locale-gen
没有跑。但我没有亲自验证。
是在vmware里安装的arch虚拟机:
⋊> reki@RekiArch ⋊> ~ neofetch 09:37:53
-` reki@RekiArch
.o+` -------------
`ooo/ OS: Arch Linux x86_64
`+oooo: Host: VMware20,1 None
`+oooooo: Kernel: 6.10.3-arch1-1
-+oooooo+: Uptime: 3 mins
`/:-:++oooo+: Packages: 407 (pacman)
`/++++/+++++++: Shell: fish 3.7.1
`/++++++++++++++: Resolution: 1280x800
`/+++ooooooooooooo/` Terminal: /dev/pts/0
./ooosssso++osssssso+` CPU: AMD Ryzen 7 5800X (8) @ 4.200GHz
.oossssso-````/ossssss+` GPU: 00:0f.0 VMware SVGA II Adapter
-osssssso. :ssssssso. Memory: 306MiB / 7904MiB
:osssssss/ osssso+++.
/ossssssss/ +ssssooo/-
`/ossssso+/:- -:/+osssso+-
`+sso+:-` `.-/+oso:
`++:. `-/+/
.` `/
⋊> reki@RekiArch ⋊> ~ locale -a 09:37:55
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_COLLATE to default locale: No such file or directory
C
C.utf8
POSIX
⋊> reki@RekiArch ⋊> ~ cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="38;2;23;147;209"
HOME_URL="https://archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://gitlab.archlinux.org/groups/archlinux/-/issues"
PRIVACY_POLICY_URL="https://terms.archlinux.org/docs/privacy-policy/"
LOGO=archlinux-logo
不过 /etc/locale.gen
里面好像没有任何一行是取消注释的:
...
#zh_CN.GB18030 GB18030
#zh_CN.GBK GBK
#zh_CN.UTF-8 UTF-8
#zh_CN GB2312
#zh_HK.UTF-8 UTF-8
#zh_HK BIG5-HKSCS
#zh_SG.UTF-8 UTF-8
#zh_SG.GBK GBK
#zh_SG GB2312
#zh_TW.EUC-TW EUC-TW
#zh_TW.UTF-8 UTF-8
#zh_TW BIG5
#zu_ZA.UTF-8 UTF-8
#zu_ZA ISO-8859-1
...
我这里用新装的 archlinux:latest
容器,无法复现,尤其在于:
[root@0bdeeafdf888 /]# locale -a
C
C.utf8
POSIX
注意:没有报 Cannot set LC_* to default locale: No such file or directory
的错误。
目前怀疑你的环境没有进行过 locale-gen
,执行一下之后再试试?(当 /etc/locale.gen
没有明确启用任何 locale 的时候,locale-gen
会生成 glibc 所支持的所有 locales。)
如果确认是这个原因导致的非预期行为的话,那么合理的修复应该是:探测这个问题并提醒用户自行解决。
我试了一下跑了locale-gen,还是会有这个问题
⋊> reki@RekiArch ⋊> /u/l/locale sudo locale-gen 11:01:47
Generating locales...
zh_CN.UTF-8... done
Generation complete.
⋊> reki@RekiArch ⋊> /u/l/locale ls 11:01:54
drwxr-xr-x root root 4.0 KB Mon Aug 5 15:21:03 2024 C.utf8
.rw-r--r-- root root 3.1 MB Thu Sep 19 11:01:54 2024 locale-archive
⋊> reki@RekiArch ⋊> /u/l/locale ruyi news list 11:01:55
Traceback (most recent call last):
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/__main__.py", line 53, in <module>
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/ruyi/cli/__init__.py", line 319, in main
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/ruyi/ruyipkg/news_cli.py", line 42, in cli_news_list
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/ruyi/ruyipkg/repo.py", line 462, in news_store
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/ruyi/ruyipkg/repo.py", line 454, in ensure_news_cache
File "encodings/ascii.py", line 26, in decode
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 25: ordinal not in range(128)
我试了一下,只要把 LC_CTYPE
这个变量设置为 locale -a
里面支持 utf-8 的项,它就可以正常解析。反之把它设为别的东西或者置为空,它就会抛出异常,所以理论上 catch 到这个错误之后只要检查变量 LC_CTYPE
的值或者返回 locale -a
的输出就可以提醒用户是否遇到相同的问题了。
相关文档:https://docs.python.org/3/library/locale.html#locale.getpreferredencoding
⋊> reki@RekiArch ⋊> ~ set -gx LC_CTYPE 11:19:40
⋊> reki@RekiArch ⋊> ~ ruyi news list 11:19:51
Traceback (most recent call last):
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/__main__.py", line 53, in <module>
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/ruyi/cli/__init__.py", line 319, in main
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/ruyi/ruyipkg/news_cli.py", line 42, in cli_news_list
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/ruyi/ruyipkg/repo.py", line 462, in news_store
File "/home/reki/.cache/ruyi/progcache/0.16.0/x86_64/ruyi/ruyipkg/repo.py", line 454, in ensure_news_cache
File "encodings/ascii.py", line 26, in decode
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 25: ordinal not in range(128)
⋊> reki@RekiArch ⋊> ~ set -gx LC_CTYPE zh_CN.utf8 11:19:53
⋊> reki@RekiArch ⋊> ~ ruyi news list 11:19:57
News items:
No. ID Title
────────────────────────────────────────────────────────────────────────────────────────────
1 2024-01-14-ruyi-news RuyiSDK 支持展示新闻了
2 2024-01-15-new-board-images 新增板卡支持 (2024-01-15)
3 2024-01-29-new-board-images 新增板卡支持 (2024-01-29)
4 2024-01-29-ruyi-0.4 RuyiSDK 0.4 版本更新说明
5 2024-02-26-gnu-plct-rv64ilp32-elf RV64ILP32 裸机工具链与 profile 现已可用
6 2024-04-23-ruyi-0.9 RuyiSDK 0.9 版本更新说明
7 2024-05-14-ruyi-0.10 RuyiSDK 0.10 版本更新说明
8 2024-05-28-ruyi-0.11 RuyiSDK 0.11 版本更新说明
9 2024-06-11-ruyi-0.12 RuyiSDK 0.12 版本更新说明
10 2024-06-24-ruyi-0.13 RuyiSDK 0.13 版本更新说明
11 2024-07-08-box64-wps-office-poc 尝鲜:使用 Box64 在 RISC-V 系统上运行 WPS Office
12 2024-07-09-ruyi-0.14 RuyiSDK 0.14 版本更新说明
13 2024-07-23-ruyi-0.15 RuyiSDK 0.15 版本更新说明
无法复现
在 archlinux:latest
容器内:
# pacman -Sy fish python && fish
[...]
root@3d18c58e7ac6 /# set -gx LC_CTYPE
root@3d18c58e7ac6 /# python -c 'import locale; print(locale.getpreferredencoding())'
UTF-8
root@3d18c58e7ac6 /# locale
LANG=C.UTF-8
LC_CTYPE=
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES="C.UTF-8"
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
您的环境的 locale
输出是怎样的?
又试了一下,在另一个正常的虚拟机环境里去修改环境变量,有这样的结果:
23:19 reki@HyperVArch ~ ./rw
$ set -gx LANG
23:19 reki@HyperVArch ~ ./rw
$ locale
LANG=
LC_CTYPE=
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES=C.UTF-8
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
23:19 reki@HyperVArch ~ ./rw
$ ruyi news list
Traceback (most recent call last):
File "/home/reki/.cache/ruyi/progcache/0.15.0/x86_64/__main__.py", line 53, in <module>
File "/home/reki/.cache/ruyi/progcache/0.15.0/x86_64/ruyi/cli/__init__.py", line 319, in main
File "/home/reki/.cache/ruyi/progcache/0.15.0/x86_64/ruyi/ruyipkg/news_cli.py", line 42, in cli_news_list
File "/home/reki/.cache/ruyi/progcache/0.15.0/x86_64/ruyi/ruyipkg/repo.py", line 462, in news_store
File "/home/reki/.cache/ruyi/progcache/0.15.0/x86_64/ruyi/ruyipkg/repo.py", line 454, in ensure_news_cache
File "encodings/ascii.py", line 26, in decode
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 22: ordinal not in range(128)
然后如果这么改就能恢复正常:
23:19 reki@HyperVArch ~ ./rw
$ set -gx LANG C.UTF-8 ↵ 1
23:20 reki@HyperVArch ~ ./rw
$ locale ↵ 1
LANG=C.UTF-8
LC_CTYPE=
LC_NUMERIC="C.UTF-8"
LC_TIME="C.UTF-8"
LC_COLLATE="C.UTF-8"
LC_MONETARY="C.UTF-8"
LC_MESSAGES=C.UTF-8
LC_PAPER="C.UTF-8"
LC_NAME="C.UTF-8"
LC_ADDRESS="C.UTF-8"
LC_TELEPHONE="C.UTF-8"
LC_MEASUREMENT="C.UTF-8"
LC_IDENTIFICATION="C.UTF-8"
LC_ALL=
23:20 reki@HyperVArch ~ ./rw
$ ruyi news list
News items:
No. ID Title
──────────────────────────────────────────────────────────────────────────────────────────────────
1 2024-01-14-ruyi-news RuyiSDK now supports displaying news
2 2024-01-15-new-board-images New board images available (2024-01-15)
3 2024-01-29-new-board-images New board images available (2024-01-29)
4 2024-01-29-ruyi-0.4 Release notes for RuyiSDK 0.4
5 2024-02-26-gnu-plct-rv64ilp32-elf RV64ILP32 bare-metal toolchain & profile now available
6 2024-04-23-ruyi-0.9 Release notes for RuyiSDK 0.9
7 2024-05-14-ruyi-0.10 Release notes for RuyiSDK 0.10
8 2024-05-28-ruyi-0.11 Release notes for RuyiSDK 0.11
9 2024-06-11-ruyi-0.12 Release notes for RuyiSDK 0.12
10 2024-06-24-ruyi-0.13 Release notes for RuyiSDK 0.13
11 2024-07-08-box64-wps-office-poc 尝鲜:使用 Box64 在 RISC-V 系统上运行 WPS Office
12 2024-07-09-ruyi-0.14 Release notes for RuyiSDK 0.14
13 2024-07-23-ruyi-0.15 Release notes for RuyiSDK 0.15
似乎需要 LANG
和 LC_CTYPE
全都不是 locale -a
中支持的,并且是 UTF-8 的 locale 才会出现
请问这个pr还会合吗,还是说需要继续改成检测这两个变量提醒用户的模式?
请问这个pr还会合吗,还是说需要继续改成检测这两个变量提醒用户的模式?
我提交了 #185 为所有文本模式的文件在打开时指定了 utf-8
编码。考虑到用户终端等等外部环境确实可能非 utf-8 编码,可能不适合在程序初始化时代替用户设置一个 UTF-8 locale(也不一定能设置成功)。在检测到 locale 设置不合理时,提醒用户修复配置是另一件事情,应该与当前问题分开解决。
由于我们目前仍然在决定以何种方式接受外部贡献(我们使先前的一位外部贡献者在 commit message 增加了 DCO 方式的 Signed-off-by
信息,但未来不一定会继续采用 DCO 方式接受外部贡献),鉴于这个 PR 的提交信息中没有带上 DCO sign-off,我们不会原样合并这个提交。如果你想让这个 PR 合并的话,请至少在提交说明中加入 Signed-off-by
头,具体做法参照 https://developercertificate.org/ 的做法——这样我会把 #202 rebase 到你的提交之上再合并。
如果你想让这个 PR 合并的话,请至少在提交说明中加入
Signed-off-by
头
已添加
当运行
ruyi news list
命令时,没有在open()
命令中指定encoding
,当读出来的内容中包含Unicode
字符时,会导致ruyi
抛出UnicodeDecodeError
的异常:但是此问题不是必现,只有使用编译/发布的二进制才会出现。在开发环境中不会出现。
分析如下:
open
函数会使用locale.getpreferredencoding()
的值来作为默认的encoding
,打印之后发现这个值在命令行中运行命令时是ANSI_X3.4-1968
,使用python
来直接运行.py
文件时则是utf-8
,这就是在开发环境中跑的时候不会出错的原因发现系统中的全局变量
LC_CTYPE
和LC_ALL
都是没有设定的,但是LANG
有设定为en_US.UTF-8
,可能前两个没有的情况下就会导致locale.getpreferredencoding()
值为ANSI_X3.4-1968
在
repo.py
ensure_news_cache()
中open()
时显式指定encoding
为utf-8
可避免此问题