I've found what appears to be poor tagging of words which should be tagged nouns or proper nouns in some instances, as verbs or adjectives in others. I believe the utterances I'm using to be fairly straightforward English utterances that the API shouldn't have too much trouble with. It's causing problems in extracting entities from utterances. Take these utterances for example.
For my purposes, each utterance starts with an uppercase letter, and is otherwise forced into lower-case, so the API can't rely on uppercase letters to interpret if a word is a proper noun. We have the API's output, and a further bit of code on our end to assess if the word is an entity. (That relies on the root tag and certain patterns of tags, so we could modify that on our end for a temporary workaround.)
"Call edward smith and vincenzo de campo":
Phrase
Root Tag
Tags
Is Entity
Call
VP
VP-VP-VB
False
edward
VP
VP-VP-S-NP-NN
True
smith
VP
VP-VP-S-ADJP-JJ
True
and
VP
VP-CC
False
vincenzo
VP
VP-VP-VB
False
de
VP
VP-VP-PP-IN
True
campo
VP
VP-VP-PP-NN
True
Here, "edward" is interpreted as a noun. Our system therefore picks it up as an entity along with "smith" as they're found in the right context.
"call edith walker and edward smith":
Phrase
Root Tag
Tags
Is Entity
Call
VP
VP-VP-VB
False
edith
VP
VP-VP-S-ADJP-RB
True
walker
VP
VP-VP-S-ADJP-JJ
True
and
VP
VP-CC
False
edward
VP
VP-VP-VB
False
smith
VP
VP-VP-JJ
True
In this instance, the API doesn't recognise that "and" is extending the initial verb, so it seems to be interpreting "edward" as the verb rather than the noun, and only picking up "smith" as the noun. This is understandable as something the API isn't yet prepared to understand. It would be a lot more helpful if the API could make the connection that "call edith walker and edward smith" truly reads as "call edith walker and call edward smith".
"Book a meeting with edward walker and edith smith":
Phrase
Root Tag
Tags
Is Entity
Book
VP
VP-NN
False
a
VP
VP-NP-NP-NP-DT
False
meeting
VP
VP-NP-NP-NP-NN
False
with
VP
VP-NP-NP-PP-IN
False
edward
VP
VP-NP-NP-PP-NP-JJ
True
walker
VP
VP-NP-NP-PP-NP-NN
True
and
VP
VP-NP-CC
False
edith
VP
VP-NP-NP-JJ
False
smith
VP
VP-NP-NP-NN
False
Firstly, the API doesn't understand that "book" in this context is a verb, like "call" in the previous example. Importantly for the entity extraction, it seems the API is interpreting "edward" as an adjective that should be modifying the following "walker" noun. However, English rules should only apply if it also included the indefinite article, as in, "Book a meeting with [a] blue Walker", so this seems to be an error.
"Register edwin smith as a new employee":
Phrase
Root Tag
Tags
Is Entity
Register
ADJP
ADJP-ADJP-RB
False
edwin
ADJP
ADJP-ADJP-RB
False
smith
ADJP
ADJP-ADJP-JJ
True
as
ADJP
ADJP-PP-IN
False
a
ADJP
ADJP-PP-NP-DT
False
new
ADJP
ADJP-PP-NP-JJ
True
employee
ADJP
ADJP-PP-NP-NN
True
Firstly, the API wrongly reads the verb "register" as an adverb, which would affect how it interprets the following word. Subsequently, we have the name, "edwin", which is also intepreted as an adverb, rather than a noun or proper noun. It's only "smith" which is correctly identified as a noun.
I've found what appears to be poor tagging of words which should be tagged nouns or proper nouns in some instances, as verbs or adjectives in others. I believe the utterances I'm using to be fairly straightforward English utterances that the API shouldn't have too much trouble with. It's causing problems in extracting entities from utterances. Take these utterances for example.
For my purposes, each utterance starts with an uppercase letter, and is otherwise forced into lower-case, so the API can't rely on uppercase letters to interpret if a word is a proper noun. We have the API's output, and a further bit of code on our end to assess if the word is an entity. (That relies on the root tag and certain patterns of tags, so we could modify that on our end for a temporary workaround.)
"Call edward smith and vincenzo de campo":
Here, "edward" is interpreted as a noun. Our system therefore picks it up as an entity along with "smith" as they're found in the right context.
"call edith walker and edward smith":
In this instance, the API doesn't recognise that "and" is extending the initial verb, so it seems to be interpreting "edward" as the verb rather than the noun, and only picking up "smith" as the noun. This is understandable as something the API isn't yet prepared to understand. It would be a lot more helpful if the API could make the connection that "call edith walker and edward smith" truly reads as "call edith walker and call edward smith".
"Book a meeting with edward walker and edith smith":
Firstly, the API doesn't understand that "book" in this context is a verb, like "call" in the previous example. Importantly for the entity extraction, it seems the API is interpreting "edward" as an adjective that should be modifying the following "walker" noun. However, English rules should only apply if it also included the indefinite article, as in, "Book a meeting with [a] blue Walker", so this seems to be an error.
"Register edwin smith as a new employee":
Firstly, the API wrongly reads the verb "register" as an adverb, which would affect how it interprets the following word. Subsequently, we have the name, "edwin", which is also intepreted as an adverb, rather than a noun or proper noun. It's only "smith" which is correctly identified as a noun.
Full Penn Tree Database list: http://web.mit.edu/6.863/www/PennTreebankTags.html