Describe the bugPyodide is a tool for running Python packages in the browser. In its current state, pycantonese cannot be run in Pyodide due to the use of multi-threading during data loading of corpus.
An error is thrown: "RuntimeError: can't start new thread". Full stack trace as follows.
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/lib/python3.12/site-packages/pycantonese/parsing.py", line 170, in parse_text
_get_utterance(sent, segment_kwargs, pos_tag_kwargs, participant)
File "/lib/python3.12/site-packages/pycantonese/parsing.py", line 56, in _get_utterance
words, tags, jps = _parse_text(unparsed_sent, segment_kwargs, pos_tag_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/site-packages/pycantonese/parsing.py", line 27, in _parse_text
chars_jps = characters_to_jyutping(text, **(segment_kwargs or {}))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/site-packages/pycantonese/jyutping/characters.py", line 101, in characters_to_jyutping
words_to_jyutping, chars_to_jyutping = _get_words_characters_to_jyutping()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/site-packages/pycantonese/jyutping/characters.py", line 14, in _get_words_characters_to_jyutping
corpus = hkcancor()
^^^^^^^^^^
File "/lib/python3.12/site-packages/pycantonese/corpus.py", line 396, in hkcancor
reader = _HKCanCorReader.from_dir(data_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/site-packages/pylangacq/chat.py", line 187, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/site-packages/pylangacq/chat.py", line 1057, in from_dir
return cls.from_files(
^^^^^^^^^^^^^^^
File "/lib/python3.12/site-packages/pylangacq/chat.py", line 187, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/lib/python3.12/site-packages/pylangacq/chat.py", line 1005, in from_files
strs = list(executor.map(_open_file, paths))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python312.zip/concurrent/futures/_base.py", line 608, in map
fs = [self.submit(fn, *args) for args in zip(*iterables)]
^^^^^^^^^^^^^^^^^^^^^^
File "/lib/python312.zip/concurrent/futures/thread.py", line 179, in submit
self._adjust_thread_count()
File "/lib/python312.zip/concurrent/futures/thread.py", line 202, in _adjust_thread_count
t.start()
File "/lib/python312.zip/threading.py", line 992, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can't start new thread
Expected behavior
The sentence can be segmented without error:
['但願', '人', '長久', ',', '千', '裡', '共', '嬋娟']
System (please complete the following information):
Allowing pycantonese to be run in JavaScript/browser will open up to many different opportunities (e.g. Cantonese-themed web apps, browser extensions).
The _HKCanCorReader.from_dir() function supports disabling multi-threading using parallel=False. Preliminary testing shows that pycantonese works in Pyodide with multi-threading disabled.
Can you add an environment variable for overriding the argument so that pycantonese can be loaded properly?
Describe the bug Pyodide is a tool for running Python packages in the browser. In its current state, pycantonese cannot be run in Pyodide due to the use of multi-threading during data loading of corpus.
To reproduce
Expected behavior The sentence can be segmented without error: ['但願', '人', '長久', ',', '千', '裡', '共', '嬋娟']
System (please complete the following information):
Additional context
parallel=False
. Preliminary testing shows that pycantonese works in Pyodide with multi-threading disabled.