statsmaths / cleanNLP

R package providing annotators and a normalized data model for natural language processing
GNU Lesser General Public License v2.1
209 stars 36 forks source link

cnlp_get_coreference() returns no rows. #32

Closed aveekchoudhury01 closed 5 years ago

aveekchoudhury01 commented 6 years ago

Hi Taylor, I am using the cleanNLP package to address co-reference resolution. Unfortunately for me, the cnlp_get_coreference() returns 0 rows of data. Following are the examples I ran -

Machine config - "Ubuntu 16.04", 32GB RAM

Steps followed -

Can you help out as to why these results?

Thanks, Aveek

statsmaths commented 6 years ago

The coreference function only works for coreNLP, unfortunately. I think the spacy people have a coreference resolution system in the works and I'll port it over as soon as it seems usable. I'm not sure why coreNLP is not working, however, could you provide a working example? Note that you need to manually set the anno_level parameter in cnlp_init_corenlp to 3 or greater for the co-reference resolution annotator to get turned on.

And I too have found that the coreNLP coreference resolution is incredibly slow. You may need to increase the amount of memory assigned to rjava if it constantly hangs.

ChengYJon commented 5 years ago

There is now a working coreference resolution system in spacy that is quite effective. Are there plans to port it over to cleannlp? Unfortunately, the CoreNLP coref is too slow unless you're working with a large computer cluster of some sort.