kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.48k stars 510 forks source link

C++ 初始化kenlm 时 报段错误 #310

Open tongchangD opened 3 years ago

tongchangD commented 3 years ago

C++ 代码 lm::ngram::Config config; model = new lm::ngram::Model(language_model_path);

初始话C++ 版本的 kenlm 模型报错 相同代码时而能成功 ,但常时间下是报错的 Segmentation fault (core dumped)

段错误 可能是哪儿指针存在问题,求大神救救我

kpu commented 3 years ago

Sorry I don't speak Chinese. But I can use machine translation. Need more context. 对不起,我不会说中文。 但是我可以使用机器翻译。 需要更多上下文。

nocoolsandwich commented 3 years ago

这个是因为boost没装好的原因,还有就是加上这个参数-S 10%

tongchangD commented 3 years ago

好的,谢谢 感觉NLP机会渺茫,做了三年NLP,我现在改行CV了,hhh

MaarufB commented 2 years ago

@tongchangD Hello bro,

好的,谢谢 感觉NLP机会渺茫,做了三年NLP,我现在改行CV了,hhh

Hello, does your vocab separated by spaces or not? Sorry for asking because I will try to build chinese language model but actually I am not a chinese, this is just for research purpose only.

tongchangD commented 2 years ago

@tongchangD Hello bro,

好的,谢谢 感觉NLP机会渺茫,做了三年NLP,我现在改行CV了,hhh

Hello, does your vocab separated by spaces or not? Sorry for asking because I will try to build chinese language model but actually I am not a chinese, this is just for research purpose only.

yes ,you mast separated sentence by spaces

kpu commented 2 years ago

For all languages, the intended use is that you first run a third-party tokenizer. For Chinese it so happens that the tokenizer performs a segmentation task.

MaarufB commented 2 years ago

@tongchangD Hello bro,

好的,谢谢 感觉NLP机会渺茫,做了三年NLP,我现在改行CV了,hhh

Hello, does your vocab separated by spaces or not? Sorry for asking because I will try to build chinese language model but actually I am not a chinese, this is just for research purpose only.

yes ,you mast separated sentence by spaces

Thank you for the your reply. "这 是 一 个 中 国 样 本" is this okay?

MaarufB commented 2 years ago

For all languages, the intended use is that you first run a third-party tokenizer. For Chinese it so happens that the tokenizer performs a segmentation task.

Hi sir where can i find that thir-party tokenizer? Thanks for your work by the way. I am new to NLP. I am doing this for re search purpose.

tongchangD commented 2 years ago

For all languages, the intended use is that you first run a third-party tokenizer. For Chinese it so happens that the tokenizer performs a segmentation task.

Hi sir where can i find that thir-party tokenizer? Thanks for your work by the way. I am new to NLP. I am doing this for re search purpose.

for example : jieba

tongchangD commented 2 years ago

For all languages, the intended use is that you first run a third-party tokenizer. For Chinese it so happens that the tokenizer performs a segmentation task.

Hi sir where can i find that thir-party tokenizer? Thanks for your work by the way. I am new to NLP. I am doing this for re search purpose.

thir-party tokenizer, you can see it in this picture, you can find the code in github image

MaarufB commented 2 years ago

For all languages, the intended use is that you first run a third-party tokenizer. For Chinese it so happens that the tokenizer performs a segmentation task.

Hi sir where can i find that thir-party tokenizer? Thanks for your work by the way. I am new to NLP. I am doing this for re search purpose.

thir-party tokenizer, you can see it in this picture, you can find the code in github image

Thank you so much for your reply sir. I'll try that and come back here to share my progress.