RuntimeError: Internal: could not parse ModelProto from /home/nlp/miniconda3/lib/python3.9/site-packages/inltk/models/hi/tokenizer.model

bezaisingh commented 8 months ago

On Ubuntu 18, Python 3.9, Installation of iNLTK had no issue, but when language is set to hi, using the command from inltk.inltk import setup setup('hi') We see the below error message:

RuntimeError Traceback (most recent call last) Cell In[9], line 2 1 from inltk.inltk import setup ----> 2 setup('hi')

File ~/miniconda3/lib/python3.9/site-packages/inltk/inltk.py:33, in setup(language_code) 31 loop = asyncio.get_event_loop() 32 tasks = [asyncio.ensure_future(download(language_code))] ---> 33 learn = loop.run_until_complete(asyncio.gather(*tasks))[0] 34 loop.close()

File ~/miniconda3/lib/python3.9/asyncio/base_events.py:623, in BaseEventLoop.run_until_complete(self, future) 612 """Run until the Future is done. 613 614 If the argument is a coroutine, it is wrapped in a Task. (...) 620 Return the Future's result, or raise its exception. 621 """ 622 self._check_closed() --> 623 self._check_running() 625 new_task = not futures.isfuture(future) 626 future = tasks.ensure_future(future, loop=self)

File ~/miniconda3/lib/python3.9/asyncio/base_events.py:583, in BaseEventLoop._check_running(self) 581 def _check_running(self): 582 if self.is_running(): --> 583 raise RuntimeError('This event loop is already running') 584 if events._get_running_loop() is not None: 585 raise RuntimeError( 586 'Cannot run the event loop while another loop is running')

RuntimeError: This event loop is already running Downloading Model. This might take time, depending on your internet connection. Please be patient. We'll only do this for the first time. Downloading Model. This might take time, depending on your internet connection. Please be patient. We'll only do this for the first time. Done!

As we saw the done message we ignored it and moved to the next step from inltk.inltk import tokenize text = 'गीक्स फॉर गीक्स एक बेहतरीन टेक्नोलॉजी लर्निंग प्लेटफॉर्म है।' tokenize(text ,'hi') And we get the below given error:

RuntimeError Traceback (most recent call last) Cell In[14], line 4 1 from inltk.inltk import tokenize 3 text = 'गीक्स फॉर गीक्स एक बेहतरीन टेक्नोलॉजी लर्निंग प्लेटफॉर्म है।' ----> 4 tokenize(text ,'hi')

File ~/miniconda3/lib/python3.9/site-packages/inltk/inltk.py:62, in tokenize(input, language_code) 60 def tokenize(input: str, language_code: str): 61 check_input_language(language_code) ---> 62 tok = LanguageTokenizer(language_code) 63 output = tok.tokenizer(input) 64 return output

File ~/miniconda3/lib/python3.9/site-packages/inltk/tokenizer.py:14, in LanguageTokenizer.init(self, lang) 12 def init(self, lang: str): 13 self.lang = lang ---> 14 self.base = EnglishTokenizer(lang) if lang == LanguageCodes.english else IndicTokenizer(lang)

File ~/miniconda3/lib/python3.9/site-packages/inltk/tokenizer.py:63, in IndicTokenizer.init(self, lang) 61 self.sp = spm.SentencePieceProcessor() 62 model_path = path/f'models/{lang}/tokenizer.model' ---> 63 self.sp.Load(str(model_path))

File ~/miniconda3/lib/python3.9/site-packages/sentencepiece/init.py:961, in SentencePieceProcessor.Load(self, model_file, model_proto) 959 if model_proto: 960 return self.LoadFromSerializedProto(model_proto) --> 961 return self.LoadFromFile(model_file)

File ~/miniconda3/lib/python3.9/site-packages/sentencepiece/init.py:316, in SentencePieceProcessor.LoadFromFile(self, arg) 315 def LoadFromFile(self, arg): --> 316 return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)

RuntimeError: Internal: could not parse ModelProto from /home/nlp/miniconda3/lib/python3.9/site-packages/inltk/models/hi/tokenizer.model

Can anyone kindly suggest a way to resolve this issue??

Cecilia-zwq commented 7 months ago

Have you sovled it? I have this kind of issue too. In this post: https://github.com/chatchat-space/Langchain-Chatchat/issues/3103, I saw someone had the same issue. This person tried to download the model again, and it worked out.

bezaisingh commented 7 months ago

No, I couldn't resolve the issue although I tried to connect to the developer Gaurav Arora, and unfortunately got no reply from him. Thanks for sharing the post I'll try it, and let's see if it works.

goru001 / inltk

RuntimeError: Internal: could not parse ModelProto from /home/nlp/miniconda3/lib/python3.9/site-packages/inltk/models/hi/tokenizer.model #99