simongray / StatementAnnotator

Custom annotator for Stanford CoreNLP that annotates sentences with the underlying statements contained within them.
4 stars 0 forks source link

Overlapping components #54

Closed simongray closed 8 years ago

simongray commented 8 years ago

In some cases, components may overlap. This is a fairly big source of problems.

It happens since a word can be both e.g. a dependent in a cop relation (direct object) and a dependent in in an nmod relation (indirect object). For example,

I've been coming here since the very beginning and in my opinion it's better than ever.

produces overlapping components.

Solution?

My immediate thoughts are always to try to fix it in the component finding process, but in a lot of cases the buggy parsing or other complexities of the relations make this impossible to fix.

However, it is apparent that no components should be overlapping in any case, so a simple way to hack/somewhat fix it would be to simply remove overlapping components. Deciding which component is overlapping can be accomplished by ranking components by type and removing the lower type, e.g. verb before subject before direct object before indirect object.

simongray commented 8 years ago

Internal quality score

In addition to detecting overlap, it would also be great if a component had some internal way of detecting whether it makes sense. For example, thus could be by looking at the words it is comprised by and the relations they come from. Certain relations could be red flags and would signal a parsing error. In this way statements could be evaluated based on parsing quality and statements with higher quality should be higher ranked in a personal profile.

Rejecting based on score

Statements with a bad internal form should be rejected outright or could be used during debugging to detect malformed patterns in the component finding algorithm. In this way, developing this internal check would help the goal of the project from two separate ways.