trungtv / vivi_spacy

A Vietnamese model for spaCy.io
45 stars 15 forks source link

Error when running in python 3.6 #4

Closed nguyen-tam closed 5 years ago

nguyen-tam commented 5 years ago

Hi,

I got this error :

File "tokenizer.pyx", line 390, in spacy.tokenizer.Tokenizer.from_disk File "tokenizer.pyx", line 436, in spacy.tokenizer.Tokenizer.from_bytes File "/usr/lib/python3.6/re.py", line 233, in compile return _compile(pattern, flags) File "/usr/lib/python3.6/re.py", line 301, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.6/sre_compile.py", line 562, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.6/sre_parse.py", line 855, in parse p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub not nested and not items)) File "/usr/lib/python3.6/sre_parse.py", line 527, in _parse code1 = _class_escape(source, this) File "/usr/lib/python3.6/sre_parse.py", line 336, in _class_escape raise source.error('bad escape %s' % escape, len(escape)) File "tokenizer.pyx", line 390, in spacy.tokenizer.Tokenizer.from_disk File "tokenizer.pyx", line 436, in spacy.tokenizer.Tokenizer.from_bytes File "/usr/lib/python3.6/re.py", line 233, in compile return _compile(pattern, flags) File "/usr/lib/python3.6/re.py", line 301, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.6/sre_compile.py", line 562, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.6/sre_parse.py", line 855, in parse p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub not nested and not items)) File "/usr/lib/python3.6/sre_parse.py", line 527, in _parse code1 = _class_escape(source, this) File "/usr/lib/python3.6/sre_parse.py", line 336, in _class_escape raise source.error('bad escape %s' % escape, len(escape)) File "tokenizer.pyx", line 390, in spacy.tokenizer.Tokenizer.from_disk File "tokenizer.pyx", line 436, in spacy.tokenizer.Tokenizer.from_bytes File "/usr/lib/python3.6/re.py", line 233, in compile return _compile(pattern, flags) File "/usr/lib/python3.6/re.py", line 301, in _compile p = sre_compile.compile(pattern, flags) File "/usr/lib/python3.6/sre_compile.py", line 562, in compile p = sre_parse.parse(p, flags) File "/usr/lib/python3.6/sre_parse.py", line 855, in parse p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0) File "/usr/lib/python3.6/sre_parse.py", line 416, in _parse_sub not nested and not items)) File "/usr/lib/python3.6/sre_parse.py", line 527, in _parse code1 = _class_escape(source, this) File "/usr/lib/python3.6/sre_parse.py", line 336, in _class_escape raise source.error('bad escape %s' % escape, len(escape)) sre_constants.error: bad escape \p at position 275

I'm using python 3.6, please help.

daniellam258 commented 5 years ago

Same error here. I'm running on Colab

nguyen-tam commented 5 years ago

@lhdung258 I published a new python 3.6 compatible model + tutorial here : https://github.com/nguyen-tam/vietnamese_spacy_model , please check it out.

daniellam258 commented 5 years ago

@nguyen-tam thanks so much. Have you test your model yet? How is it compared to the model from vivi_spacy? I have a problem for tokenization when emotion icon characters is seperated by spaces, i.e. ":)" -> ": )". I'm working on Sentiment analysis so this is really important

nguyen-tam commented 5 years ago

I can't run vivi_spacy so I can't compare. Do you have time to solve it with me? If yes, create an issue in my repo.

daniellam258 commented 5 years ago

Sorry for late reply! I've been busy these days. However, I tried different other NLP toolkit, i.e underthesea, VnCoreNlp, they all suffer from above tokenized problems. So I found another way to handle this, meaning that I don't use vivi_spacy anymore

rain1024 commented 5 years ago

@lhdung258 : I am lead developer of underthesea. We will fix this issue in the future. So please check our next versions.

daniellam258 commented 5 years ago

@rain1024 Wow! It's great, I'm looking forward to it

vncorenlp commented 5 years ago

@lhdung258 It should be straightforward to handle those cases with VnCoreNLP / RDRsegmenter, by adding those emotion tokens into the exception list in Tokenizer.java, e.g. VN_exception.add(":)"); VN_exception.add(":))"); VN_exception.add(":((");

trungtv commented 5 years ago

Dear all, vivi_spacy is outdated. Let's move to https://github.com/trungtv/vi_spacy