statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

Error with corenlp when text document contains underscores #63

Closed maelick closed 4 years ago

maelick commented 4 years ago

I upgraded to cleanNLP 3.0, and now my existing code is broken.

When running: cleanNLP::cnlp_annotate("Test _somewhere_ test, test.", backend = "corenlp")

I get the following error: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0

statsmaths commented 4 years ago

Thanks for the information. For some reason, the new Python backend returns a NoneType when given some characters (such as an underscore). I just pushed an update that fixes the issue, but note that it requires updating to the current GitHub version of the R package AND the Python package. You can do that by running this in R:

devtools::install_github("statsmaths/cleanNLP")

And these in a terminal:

git clone https://github.com/statsmaths/cleanNLP-python
cd cleanNLP-python
python -m pip install .

If the two versions are out-of-sync, you will probably get an error about the number of arguments. Please let me know if this does not solve the problem or you find anything else with the new version of the package.

maelick commented 4 years ago

Thanks, it now works :-)