Closed simongray closed 7 years ago
When finding statements, components are allowed to exist temporarily in different statements. The final step of each iteration of the statement finding algorithm is about resolving where to place the component definitely. This is resolved by examining the number of duplicate components of each statement.
For each component that is contained by different statements, each possible combination of statements is tried out, i.e. the component is temorarily removed from all but one of the statements resulting in some combination of statements where the component only figures in a single statement. Whichever combination has the smallest total amount of duplicate components is chosen as the final combination.
Saving this bit for later (attempt yesterday that I did not commit):
if (connections.isEmpty()) {
logger.info("not preserving component: " + component);
} else {
connections.add(component);
Statement statementToRemove = null;
Set<Statement> resultingStatements = null;
// move components from one statement to another if this action will preserve another statement
// that would otherwise be removed from the pool
// this is also done to counteract issues with misinterpreted conjunctions (issue #57)
for (StatementComponent connectedComponent : connectedComponents) {
if (connectedComponent instanceof Statement) {
Statement statement = (Statement) connectedComponent;
Set<StatementComponent> sharedComponents = new HashSet<>();
for (StatementComponent connection : connections) {
if (statement.contains(connection)) {
sharedComponents.add(connection);
}
}
// will the statement survive the move?
if (statement.count() - sharedComponents.size() > 1) {
resultingStatements = new HashSet<>();
logger.info("moving components: " + sharedComponents);
statementToRemove = statement;
// the new statement
logger.info("made new statement from components: " + connections);
Statement newStatement = new Statement(connections);
logger.info("new statement: " + newStatement);
resultingStatements.add(newStatement);
// the modified statement
Set<StatementComponent> reducedComponents = new HashSet<>(statement.getComponents());
reducedComponents.removeAll(sharedComponents);
logger.info("modified statement to components: " + reducedComponents);
Statement reducedStatement = new Statement(reducedComponents);
logger.info("new statement: " + newStatement);
}
}
}
}
}
There is a potentially huge source of errors in the way that conjunctions are sometimes interpreted by the dependency parser.
Example
The example sentence
Produces the following statements
Causes
This is caused by the fact that the "and" is interpreted as a conjunction between "myself" and "comforting" and not as a separator of two different logical statements.
Due to the nature of the statement finding algorithm, components in conjunctions are split into separate statements for each component in the conjunction causing the strange statement "I went to Copenhagen Business School quite comforting".
Another feature of the algorithm is decision to not re-use components in different statements, causing the "it's ..." to be dropped entirely. Currently, the algorithm assigns components to statements on a first-come-first-serve basis, which is why the component is only ever (wrongly) associated with the first part of the sentence.
Possible solutions
Maximising information: this is in effect a parsing issue, so it is not really solvable or meant to be solved. However, perhaps the algorithm can be modified to assign components to statements in a more intelligent way. One way to do this is by maximising the amount of statements produced, so that a component shifts statement if the result is that the statement is dropped otherwise.