jiaeyan / Jiayan

甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the 1st NLP toolkit designed for Classical Chinese, supports lexicon construction, tokenizing, POS tagging, sentence segmentation and punctuation.
MIT License
585 stars 71 forks source link

请问一下,jiayan.klm模型是用什么语料进行训练,我自己能否进一步改进模型 #8

Closed ethanliuzhuo closed 4 years ago

random-yang commented 4 years ago

看代码感觉是有个叫做all.txt的文件,包含了大量的训练原文,不过作者没有给出。

jiaeyan commented 4 years ago

抱歉回复晚了。 甲言的语言模型是用这个数据训练的。 如果你想要训练一个自己的语言模型,请使用该项目进行训练;如果你想要训练一个新的标注模型,请参考examples.py里的train_sentencizer, train_punctuator, train_postagger三个函数。 希望能有帮助!

hannyduan commented 4 years ago

File "", line 6, in lm = load_lm('jiayan.klm')

File "D:\Users\hanny.duan.LCFC\AppData\Local\Continuum\anaconda3\lib\site-packages\jiayan__init__.py", line 12, in load_lm return kenlm.LanguageModel(lm)

File "kenlm.pyx", line 122, in kenlm.Model.init

OSError: Cannot read model 'jiayan.klm' (util\file.cc:74 in util::OpenReadOrThrow threw ErrnoException because `-1 == (ret = _open(name, 0x8000 | 0x0000))'. No such file or directory while opening E:\学习资料\python\机器学习实战\MLiA_SourceCode\shiyanke\诗词\jiayan.klm) 你好,在使用 lm = load_lm('jiayan.klm')这个的时候报错,不知道如何纠错,还请帮忙解决下,非常感谢!

jiaeyan commented 4 years ago

可以说一下你的操作环境和python版本吗?目前在mac以及python3上测试是没有问题的。

spiritXie commented 4 years ago

win10下python3.7也出现了楼上那个找不到jiayan.klm文件的问题。希望作者能指点一二,非常感谢!

fishandcrystle commented 4 years ago

OSError: Cannot read model 'jiayan.klm' (util/file.cc:76 in int util::OpenReadOrThrow(const char*) threw ErrnoException because `-1 == (ret = open(name, 00))'. No such file or directory while opening /content/jiayan.klm)

在Google colab上运行也无法找到文件

670619720 commented 4 years ago

File "", line 6, in lm = load_lm('jiayan.klm')

File "D:\Users\hanny.duan.LCFC\AppData\Local\Continuum\anaconda3\lib\site-packages\jiayaninit.py", line 12, in load_lm return kenlm.LanguageModel(lm)

File "kenlm.pyx", line 122, in kenlm.Model.init

OSError: Cannot read model 'jiayan.klm' (util\file.cc:74 in util::OpenReadOrThrow threw ErrnoException because `-1 == (ret = _open(name, 0x8000 | 0x0000))'. No such file or directory while opening E:\学习资料\python\机器学习实战\MLiA_SourceCode\shiyanke\诗词\jiayan.klm) 你好,在使用 lm = load_lm('jiayan.klm')这个的时候报错,不知道如何纠错,还请帮忙解决下,非常感谢!

jiaeyan commented 4 years ago

Instead of using model = kenlm.LanguageModel('lm/test.arpa'), please try this to load the model: model = kenlm.Model('lm/test.arpa') to see if it works, thanks! 你好,请用 model = kenlm.Model('lm/test.arpa') 来尝试载入模型 model = kenlm.LanguageModel('lm/test.arpa') 的方法可能已经过期。谢谢!

spiritXie commented 4 years ago

你好,更改load_lm的kenlm.LanguageModel(lm),似乎还是不能加载model。请问是不是没有正确找到需要更改的地方?

jiaeyan commented 4 years ago

也是和之前相同的报错信息吗?我最近比较忙,等有时间了我再来看一下这个问题,争取尽早解决。谢谢帮助测试。

jiaeyan commented 4 years ago

@hannyduan @spiritXie @670619720 @fishandcrystle 在实际使用过程中,请根据具体下载目录来修改 lm = load_lm('jiayan.klm') 中"jiayan.klm"模型的地址,例如"C:\用户名\下载目录\jiayan.klm"。根据报错信息应该是文件加载路径不正确。我会关闭这个issue,如果还是有问题,欢迎开一个新的issue。谢谢!

wanglvyuan commented 2 weeks ago

您好,这些方法我都试了,但好像还是不行,还是提示“module “kenlm” has no attribute "languagemodel"”,想问一下可以怎么解决呢?