Open stickjitb opened 6 years ago
Hello @stickjitb! Unfortunately, NLPIR (the library we use to segment text), uses /
as the separator between tokens that it segments. Here is a typical example that NLPIR returns for hi there
hi/o there/rzs
You'll notice that it uses spaces between tokens. And, it uses a /
to separate the token from the part of speech.
In your example, this is returned:
[ / ]/xm
That breaks the format that the NLPIR project has decided to use for their token separation.
There really isn't anything we can do on the PyNLPIR side about this. You might try talking to the NLPIR team on their GitHub project or website: https://github.com/NLPIR-team/NLPIR http://ictclas.nlpir.org/
We could get fancy in how we process the tokens from NLPIR by using look-ahead assertions in a regular expression (like only splitting on /
if it has [a-z]
immediately following it), but this doesn't seem like a common enough problem or a typical enough text sample to make that worthwhile.
@tsroten OK, I get it. If that is the case, I don't think the NLPIR team will have any solution because you always need a pattern as a separator. Anyway I've reported this issue to them.
Just as you said, the pattern '[ / ]' is not typical enough so maybe I should just take some ad hoc measures if I need to deal with text containing that and no better solutions can be proposed.
When I try to segment a string containing the pattern '[ / ]', an UnboundLocalError has occurred.
No errors when trying to segment '[/]' or '/' or '[ ]' where stands for a symbol other than /.
Here are the outputs:
File "NER_train.py", line 8, in
s = pynlpir.segment('[ / ]')
File "D:\Python\lib\site-packages\pynlpir__init.py", line 248, in segment
pos_name = _get_pos_name(token[1], pos_names, pos_english)
File "D:\Python\lib\site-packages\pynlpir\init__.py", line 193, in _get_pos_name
pos_name = pos_map.get_pos_name(code, name, english)
File "D:\Python\lib\site-packages\pynlpir\pos_map.py", line 190, in get_pos_name
return _get_pos_name(code, name, english)
File "D:\Python\lib\site-packages\pynlpir\pos_map.py", line 151, in _get_pos_name
pos = (pos_entry[1 if english else 0], )
UnboundLocalError: local variable 'pos_entry' referenced before assignment