kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.51k stars 511 forks source link

Language model can not be used in multiprocessing #365

Open yjiangling opened 2 years ago

yjiangling commented 2 years ago

Hi all,

    When I use multiprocess like this:

` class LM_Decode(object):

def __init__(self, lm_path):
    self.lm_model = kenlm.LanguageModel(lm_path)

def decode(self, sent):
    lm_prob = list(self.lm_model.full_score(sent))
    return lm_prob[-1][0]

lm_decoder = LM_Decode("lm_path.bin")

pool = multiprocessing.Pool(processes=4)

sent_list=['sent1', 'sent2', 'sent3', 'sent4']

pred = pool.map(func=lm_decoder.decode, iterable=sent_list) `

It gives the following error:

File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 266, in map return self._map_async(func, iterable, mapstar, chunksize).get() File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks put(task) File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/media/tclwh2/tanglei/anaconda3/envs/tf1_12/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) File "kenlm.pyx", line 258, in kenlm.Model.reduce (python/kenlm.cpp:3929) NameError: name '_kenlm' is not defined

What's wrong with it? How to fix it? Anyone can help me? Thanks a lot!

kpu commented 2 years ago

The underlying C++ code is threadsafe (and for that matter the mmap can share memory). I don't know enough about python though.

yjiangling commented 2 years ago

The underlying C++ code is threadsafe (and for that matter the mmap can share memory). I don't know enough about python though.

Thanks a lot for the reply, I found that it can run normally in Python3.7 and above but will give error in Python3.6 and below. Maybe the implement of multiprocessing in Python3.6 and Python3.7 are different.