Open mlpacheco opened 4 years ago
I think I found out what was the problem - NLTK trees are indexed by equality, therefore parent detection in the chunk tree fails, if multiple cases of the same tagged token fall under the same noun phrase. Currently I added a temporary workaround to allow further processing, but this is more serious issue which might be addressed later
I've come accross a couple of issues with the chunker: 1 - It can't handle underscore characters, which I've solved by replacing them.
However, I am getting this strange issue while trying to process this sentence:
Confirm L and Confirm R options complete feature negotiation and are sent in response to Change R and Change L options , respectively .
I can tag it and get:
[('Confirm', 'NNP'), ('L', 'NNP'), ('and', 'CC'), ('Confirm', 'NNP'), ('R', 'NN'), ('options', 'NNS'), ('complete', 'JJ'), ('feature', 'NN'), ('negotiation', 'NN'), ('and', 'CC'), ('are', 'VBP'), ('sent', 'VBN'), ('in', 'IN'), ('response', 'NN'), ('to', 'TO'), ('Change', 'NNP'), ('R', 'NN'), ('and', 'CC'), ('Change', 'NNP'), ('L', 'NNP'), ('options', 'NNS'), (',', ','), ('respectively', 'RB'), ('.', '.')]
But once I attempt to run the chunker I run into an issue:
I can't figure why such a simple sentence would fail. There seems to be no parent when doing: https://github.com/paudan/opennlp_python/blob/master/nltk_opennlp/chunkers.py#L149