simongray / StatementAnnotator

Custom annotator for Stanford CoreNLP that annotates sentences with the underlying statements contained within them.
4 stars 0 forks source link

Bad parsing due to misinterpreted conjunctions #57

Closed simongray closed 7 years ago

simongray commented 7 years ago

There is a potentially huge source of errors in the way that conjunctions are sometimes interpreted by the dependency parser.

Example

The example sentence

I went to Copenhagen Business School myself and it's quite comforting to know that the place is also funding research into inequality.

Produces the following statements

I went to Copenhagen Business School myself and it's quite comforting to know that the place is also funding research into inequality.
  |_ statement: {S+V+DO+IO: "that the place is also funding research into inequality", density: 0,67}
  |  |_ component: {Verb: "is also funding", gaps: 1, local: "no"}
  |  |_ component: {Subject: "the place", gaps: 0, local: "no"}
  |  |_ component: {DirectObject: "research", gaps: 0, local: "no"}
  |  |_ component: {IndirectObject: "into inequality", gaps: 0, local: "no", labels: "ConjIndirectObject"}
  |_ statement: {S+V+DO+IO: "I went to Copenhagen Business School myself", density: 0,57}
  |  |_ component: {Verb: "went", gaps: 0, local: "no"}
  |  |_ component: {Subject: "I", gaps: 0, local: "no"}
  |  |_ component: {IndirectObject: "to Copenhagen Business School", gaps: 0, local: "no", labels: "ConjIndirectObject"}
  |  |_ component: {DirectObject: "myself", gaps: 0, local: "no", conjunction: "comforting"}
  |_ statement: {S+V+DO+IO: "I went to Copenhagen Business School quite comforting", density: 0,75}
     |_ component: {Verb: "went", gaps: 0, local: "no"}
     |_ component: {Subject: "I", gaps: 0, local: "no"}
     |_ component: {DirectObject: "quite comforting", gaps: 0, local: "no", conjunction: "myself"}
     |_ component: {IndirectObject: "to Copenhagen Business School", gaps: 0, local: "no", labels: "ConjIndirectObject"}

Causes

This is caused by the fact that the "and" is interpreted as a conjunction between "myself" and "comforting" and not as a separator of two different logical statements.

Due to the nature of the statement finding algorithm, components in conjunctions are split into separate statements for each component in the conjunction causing the strange statement "I went to Copenhagen Business School quite comforting".

Another feature of the algorithm is decision to not re-use components in different statements, causing the "it's ..." to be dropped entirely. Currently, the algorithm assigns components to statements on a first-come-first-serve basis, which is why the component is only ever (wrongly) associated with the first part of the sentence.

Possible solutions

Maximising information: this is in effect a parsing issue, so it is not really solvable or meant to be solved. However, perhaps the algorithm can be modified to assign components to statements in a more intelligent way. One way to do this is by maximising the amount of statements produced, so that a component shifts statement if the result is that the statement is dropped otherwise.

simongray commented 7 years ago

Better solution

When finding statements, components are allowed to exist temporarily in different statements. The final step of each iteration of the statement finding algorithm is about resolving where to place the component definitely. This is resolved by examining the number of duplicate components of each statement.

For each component that is contained by different statements, each possible combination of statements is tried out, i.e. the component is temorarily removed from all but one of the statements resulting in some combination of statements where the component only figures in a single statement. Whichever combination has the smallest total amount of duplicate components is chosen as the final combination.

simongray commented 7 years ago

Saving this bit for later (attempt yesterday that I did not commit):

  if (connections.isEmpty()) {
                    logger.info("not preserving component: " + component);
                } else {
                    connections.add(component);
                    Statement statementToRemove = null;
                    Set<Statement> resultingStatements = null;

                    // move components from one statement to another if this action will preserve another statement
                    // that would otherwise be removed from the pool
                    // this is also done to counteract issues with misinterpreted conjunctions (issue #57)
                    for (StatementComponent connectedComponent : connectedComponents) {
                        if (connectedComponent instanceof Statement) {
                            Statement statement = (Statement) connectedComponent;
                            Set<StatementComponent> sharedComponents = new HashSet<>();

                            for (StatementComponent connection : connections) {
                                if (statement.contains(connection)) {
                                    sharedComponents.add(connection);
                                }
                            }

                            // will the statement survive the move?
                            if (statement.count() - sharedComponents.size() > 1) {
                                resultingStatements = new HashSet<>();
                                logger.info("moving components: " + sharedComponents);
                                statementToRemove = statement;

                                // the new statement
                                logger.info("made new statement from components: " + connections);
                                Statement newStatement = new Statement(connections);
                                logger.info("new statement: " + newStatement);
                                resultingStatements.add(newStatement);

                                // the modified statement
                                Set<StatementComponent> reducedComponents = new HashSet<>(statement.getComponents());
                                reducedComponents.removeAll(sharedComponents);
                                logger.info("modified statement to components: " + reducedComponents);
                                Statement reducedStatement = new Statement(reducedComponents);
                                logger.info("new statement: " + newStatement);

                            }
                        }
                    }
                }
            }