Open hexingren opened 6 years ago
Yes, try this:
from pynlp import StanfordCoreNLP
nlp = StanfordCoreNLP(annotators='tokenize, ssplit, pos, ner')
document = nlp("Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife")
for sentence in document:
for token in sentence:
print(token, token.ner)
This will give you token level named entity recognition.
If you want entities that span multiple tokens, use entitymentions
nlp = StanfordCoreNLP(annotators='entitymentions')
for entity in document.entities:
print(entity)
I will try to write up some docs soon.
For the first block of code, it will fall back to #12 if I add 'tokenize, ssplit, pos'. The working code for now is
from pynlp import StanfordCoreNLP
nlp = StanfordCoreNLP(annotators='ner', options = {"ner.useSUTime": False})
# The code below throws CoreNLPServerError: Status code: [500]
# nlp = StanfordCoreNLP(annotators='tokenize, ssplit, pos, ner', options = {"ner.useSUTime": False})
document = nlp("Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife")
for sentence in document:
for token in sentence:
print(token, token.ner)
Should be a problem on the CoreNLP server side. Thanks!
Hello,
Does pynlp keep the original tag type "O" which is the non-entity part?
For example, sentence = "Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife"
Expecting result: [('Nora Jani', 'PERSON'), ('a single person', 'O'), ('Matt Jani', 'PERSON'), ('and', 'O'), ('Susan Jani', 'PERSON'), ('husband and wife', 'O')]
Thanks.