konlpy / konlpy

Python package for Korean natural language processing.
http://konlpy.org
Other
1.41k stars 331 forks source link

pytorch의 data loader와 konlpy 사용 (jpype error) #155

Closed eagle705 closed 4 years ago

eagle705 commented 7 years ago

pytorch의 data loader에서 multi process를 사용하는데

다음과 같이 에러가 나오네요 ㅠ

import는 다됩니다만..ㅠ (OS는 centOS입니다 ㅠ)

Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, *self._kwargs) File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 43, in _worker_loop data_queue.put((idx, ExceptionWrapper(sys.exc_info()))) File "/usr/lib/python2.7/multiprocessing/queues.py", line 390, in put return send(obj) File "/usr/local/lib/python2.7/dist-packages/torch/multiprocessing/queue.py", line 17, in send ForkingPickler(buf, pickle.HIGHEST_PROTOCOL).dump(obj) File "/usr/lib/python2.7/pickle.py", line 224, in dump self.save(obj) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple save(element) File "/usr/lib/python2.7/pickle.py", line 331, in save self.save_reduce(obj=obj, rv) File "/usr/lib/python2.7/pickle.py", line 419, in save_reduce save(state) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python2.7/pickle.py", line 649, in save_dict self._batch_setitems(obj.iteritems()) File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems save(v) File "/usr/lib/python2.7/pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "/usr/lib/python2.7/pickle.py", line 748, in save_global (obj, module, name)) PicklingError: Can't pickle <class 'jpype._jexception.java.lang.ClassFormatErrorPyRaisable'>: it's not found as jpype._jexception.java.lang.ClassFormatErrorPyRaisable

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/47763049-pytorch-data-loader-konlpy-jpype-error?utm_campaign=plugin&utm_content=tracker%2F1743549&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F1743549&utm_medium=issues&utm_source=github).
eagle705 commented 7 years ago

보니까 pytorch의 data_loader가 multi processing을 사용하는데 코드 내에 jpype.attachThreadToJVM() 이 선언되지 않아서 그런것 같네요... 음.. 코드 내에 이 친구를 선언할 방법이 없는데... 후 ㅠㅠ dependency 있는게 아쉽네요 ㅠ

blissray commented 7 years ago

일반적으로 konlpy로 문서의 한글 vector를 먼저 만들어두고, TF의 경우는 Recorder같은걸로 batch를 쓸거 같은데 pytorch와 konlpy를 같이 돌려야 하는 이슈가 있나요?

minhoryang commented 4 years ago

285

multiprocessing모듈을 사용하여 konlpy를 사용하시려면, 각 Process가 초기화 되는 시점에서 konlpy tagger를 로딩하시면 됩니다.

pytorch의 data_loader에는 https://pytorch.org/docs/stable/data.html#multi-process-data-loading 여기와 같이 worker_init_fn인자로 넘겨주시면 될 것 같습니다.

사족으로, data_loader를 하는 시점에서 konlpy를 부르는게 올바른 설계인지 확신이 안섭니다. 잘 되었으면 좋겠습니다.