Does pynlp keep the original tag type "O" which is the non-entity part? - Githubissues

sina-al / pynlp

A pythonic wrapper for Stanford CoreNLP.

MIT License

107 stars 11 forks source link

Does pynlp keep the original tag type "O" which is the non-entity part? #13

Open hexingren opened 6 years ago

hexingren commented 6 years ago

Hello,

Does pynlp keep the original tag type "O" which is the non-entity part?

For example, sentence = "Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife"

Expecting result: [('Nora Jani', 'PERSON'), ('a single person', 'O'), ('Matt Jani', 'PERSON'), ('and', 'O'), ('Susan Jani', 'PERSON'), ('husband and wife', 'O')]

Thanks.

sina-al commented 6 years ago

Yes, try this:

from pynlp import StanfordCoreNLP

nlp = StanfordCoreNLP(annotators='tokenize, ssplit, pos, ner')

document = nlp("Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife")

for sentence in document:
    for token in sentence:
        print(token, token.ner)

This will give you token level named entity recognition.

If you want entities that span multiple tokens, use entitymentions

nlp = StanfordCoreNLP(annotators='entitymentions')

for entity in document.entities:
    print(entity)

sina-al commented 6 years ago

I will try to write up some docs soon.

hexingren commented 6 years ago

For the first block of code, it will fall back to #12 if I add 'tokenize, ssplit, pos'. The working code for now is

from pynlp import StanfordCoreNLP

nlp = StanfordCoreNLP(annotators='ner', options = {"ner.useSUTime": False})
# The code below throws CoreNLPServerError: Status code: [500] 
# nlp = StanfordCoreNLP(annotators='tokenize, ssplit, pos, ner', options = {"ner.useSUTime": False})

document = nlp("Nora Jani, a single person, Matt Jani and Susan Jani, husband and wife")

for sentence in document:
    for token in sentence:
        print(token, token.ner)

Should be a problem on the CoreNLP server side. Thanks!