yizt / crnn.pytorch

crnn实现水平和垂直方向中文文字识别, 提供在3w多个中文字符训练的水平识别和垂直识别的预训练模型; 欢迎关注,试用和反馈问题... ...
Apache License 2.0
244 stars 52 forks source link

遇到一个问题,就是用 fontutils.py中对我的字体做并集 #12

Closed daixiangzi closed 4 years ago

daixiangzi commented 4 years ago

然后得到一个类似于你的word.txt,但是在做 idx = [chars[c] for c in text]取类别的时候发现,对于数字出现Key error,后来我查了下,我保存下来的word.txt中的数字都是windows-1252编码,而我的系统都是UTF-8编码,所以会出现这种情况,请问你遇到过这种情况么

daixiangzi commented 4 years ago

下面是我的测试代码。 import os import sys import chardet import codecs f = codecs.open(sys.argv[1], mode='r', encoding='utf-8') lines = f.readlines() f.close() words = [l.strip() for l in lines]

dicts = {} for i, char in enumerate(words): print(char) dicts[char] = i print(dicts['4'])

yizt commented 4 years ago

您好,写word.txt文件时也使用UTF-8编码,再试试看

在 2020年6月5日,下午2:44,daixiangzi notifications@github.com 写道:

下面是我的测试代码。 import os import sys import chardet import codecs f = codecs.open(sys.argv[1], mode='r', encoding='utf-8') lines = f.readlines() f.close() words = [l.strip() for l in lines]

dicts = {} for i, char in enumerate(words): print(char) dicts[char] = i print(dicts['4'])

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yizt/crnn.pytorch/issues/12#issuecomment-639291282, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABV2ST43WDCOEJQJUMXAZ2DRVCH5ZANCNFSM4NTHZIJQ.

daixiangzi commented 4 years ago

嗯。找到原因了,是我代码的bug

---原始邮件--- 发件人: "mick.yi"<notifications@github.com> 发送时间: 2020年6月5日(周五) 下午5:38 收件人: "yizt/crnn.pytorch"<crnn.pytorch@noreply.github.com>; 抄送: "daixiangzi"<543826458@qq.com>;"Author"<author@noreply.github.com>; 主题: Re: [yizt/crnn.pytorch] 遇到一个问题,就是用 fontutils.py中对我的字体做并集 (#12)

您好,写word.txt文件时也使用UTF-8编码,再试试看

> 在 2020年6月5日,下午2:44,daixiangzi <notifications@github.com> 写道: > > > 下面是我的测试代码。 > import os > import sys > import chardet > import codecs > f = codecs.open(sys.argv[1], mode='r', encoding='utf-8') > lines = f.readlines() > f.close() > words = [l.strip() for l in lines] > > dicts = {} > for i, char in enumerate(words): > print(char) > dicts[char] = i > print(dicts['4']) > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub <https://github.com/yizt/crnn.pytorch/issues/12#issuecomment-639291282&gt;, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABV2ST43WDCOEJQJUMXAZ2DRVCH5ZANCNFSM4NTHZIJQ&gt;. >

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.