soheil-zz / jatetoolkit

Java Automatic Term Extraction toolkit
1 stars 0 forks source link

Verb Phrase Counter #6

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. TestFrequency counter for verb phrases
2. I've created a class analogues to the NounPhraseExtractorOpenNLP and updated 
the strings to B-VP and I-VP
3. The algorithm counts/extracts further the noun phrases

What is the expected output? What do you see instead?
Verb Phrase + counter

What version of the product are you using? On what operating system?
1.11

Please provide any additional information below.

What properties must be further altered so that only verb phrases are counted? 
The OpenNLP parser supports this type of annotation.

Original issue reported on code.google.com by mihail.m...@gmail.com on 19 Sep 2013 at 4:55

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Hi
jate uses opennlp 1.51, verb phrases are a little tricky to handle. You are 
right to look at "B-NP" and "I-NP" in the "chunkNP" method in 
"NounPhraseExtractorOpenNLP" class, but I think you need to write a separate 
method that implements a slightly different process. 

Example:
Tokens = They have replaceable teeth .
Chunker output = B-NP,B-VP,B-NP,I-NP,O

Tokens = Humans kill around 26 to 73 million sharks every year ...
Chunker output =B-NP,B-VP,B-ADVP,B-NP,B-PP,B-NP,I-NP,I-NP,B-NP

As you see, B-VP identifies the beginning of a VP, but there are no "I-VP" that 
identifies the "inner" of a VP, but rather noun phrases or adverbs/proposition 
phrases. So your code need to handle these cases.

This will be added in the next version of this tool.

Original comment by ziqizhan...@googlemail.com on 25 Sep 2013 at 1:39