statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

cnlp_annotate crashes when the string is empty #61

Closed bnicenboim closed 4 years ago

bnicenboim commented 4 years ago
> cnlp_annotate("")
Error in py_call_impl(callable, dots$args, dots$keywords) : 
  AssertionError: 

Detailed traceback: 
  File "/home/bruno/anaconda3/lib/python3.7/site-packages/cleannlp/corenlp.py", line 43, in parseDocument
    doc = self.nlp(text)
  File "/home/bruno/anaconda3/lib/python3.7/site-packages/stanfordnlp/pipeline/core.py", line 176, in __call__
    self.process(doc)
  File "/home/bruno/anaconda3/lib/python3.7/site-packages/stanfordnlp/pipeline/core.py", line 170, in process
    self.processors[processor_name].process(doc)
  File "/home/bruno/anaconda3/lib/python3.7/site-packages/stanfordnlp/pipeline/tokenize_processor.py", line 75, in process
    doc.conll_file = conll.CoNLLFile(input_str=conll_output_string.getvalue())
  File "/home/bruno/anaconda3/lib/python3.7/site-packages/stanfordnlp/models/common/conll.py", line 20, in __init__
    assert input_str is not None and len(input_str) > 0
statsmaths commented 4 years ago

Thank you for pointing out this error. Unfortunately, I can't merge the pull request directly because the current version of GitHub (3.0.1) is already ahead of CRAN (3.0.0) and some of the lines you removed are actually needed now when using the updated Python library (see issues #57 and #58).

I have pushed edits that fixes the issue with empty strings: 8be4c0064f0563ae5ef074bf7cdba2cd67cf1e5d. Please let me know if you run into any other issues!

bnicenboim commented 4 years ago

I don't really get the comment, about the difference in version... But with the lines I deleted, the latest github version wasn't working, it complained about an extra argument when I was initializing corenlp, and by the way, I installed cleannlp in python only today. In any case, thanks for taking care of this so quickly. I'll check the github version again soon, and I'll report back...

statsmaths commented 4 years ago

Sorry, should have been more clear. There are two repositories, this one for the R package and this one for the Python package. The issue you were running into is because I have been updating both R and Python; you would need to do the same in order to correctly run the code at the moment.