stanfordnlp / CoreNLP

CoreNLP: A Java suite of core NLP tools for tokenization, sentence segmentation, NER, parsing, coreference, sentiment analysis, etc.
http://stanfordnlp.github.io/CoreNLP/
GNU General Public License v3.0
9.68k stars 2.7k forks source link

constituency parse failure on verb references #581

Closed natewatson999 closed 6 years ago

natewatson999 commented 6 years ago

The constituency parse is not parsing "What is being very fat called?" in a logical way.

The tree yielded is: ( What (is (being (very fat called)))), when it should be (what (is ((being (very fat)) called))). "Called" is not part of the phrase "being very fat". It should be an SBAR of the root verb.

This is what it is as of the latest verison: parsefailure

This is what it should be: thesbarbranchshouldbemoved

J38 commented 6 years ago

Hi thank you for identifying a flaw in the algorithm. Unfortunately all of these systems are statistical based parsers, so even the state of the art parsers will make errors due to limited training data and the limits of the algorithm. It is not immediately clear to me how to translate specific error cases into parser improvements other than by adding training data that addresses the specific issue.

At this time we are not actively developing the constituency parser, though we may release a polished version of our latest dependency parser in Python at some point this year. There has been some active work on neural constituency parsing as well by other groups, but to the best of my knowledge no one over here is really working on constituency parsing at this time.