orientechnologies / orientdb

OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries.
https://orientdb.dev
Apache License 2.0
4.72k stars 870 forks source link

Nested traversal (subquery) bug (2.1.7 -> 3.0 migration) #8683

Closed azakkerman closed 2 years ago

azakkerman commented 5 years ago

OrientDB Version: 3.0.11

Java Version: 1.8

OS: Linux

Expected behavior

We expected outer traversal based on the subquery to return one additional node (this is indeed the behavior we observed on 2.1.7)

Actual behavior

Outer traversal does not find the extra node

Workaround

'Shield' the inner traversal with nested SELECT by id.

Minimal code to reproduce the bug

import com.orientechnologies.orient.core.sql.executor.OResult;
import java.util.List;
import java.util.stream.Collectors;
import org.apache.tinkerpop.gremlin.orientdb.OrientGraph;
import org.apache.tinkerpop.gremlin.orientdb.OrientGraphFactory;
import org.apache.tinkerpop.gremlin.orientdb.executor.OGremlinResult;
import org.apache.tinkerpop.gremlin.orientdb.executor.OGremlinResultSet;
import org.junit.Assert;
import org.junit.Test;

/**
 * This class is used to illustrate traversal bug we discovered with OrientDB 2.x upgrade to 3.0.11
 * If and when the bug is fixed, this test will start failing, allowing us to revert the workaround
 */
public class OrientDBBugReportTest {

    private static List<String> runTraversalQuery(OrientGraph graph, String query, Object... args) {
        OGremlinResultSet result = graph.executeSql(query, args);
        System.out.println("Trying query: \n" + query);
        List<String> resultNodes =
                result.stream()
                        .map(OGremlinResult::getRawResult)
                        .map(OResult::toJSON)
                        .collect(Collectors.toList());
        System.out.println("Result: \n" + String.join("\n", resultNodes));
        return resultNodes;
    }

    @Test
    public void traversalBugIsNotFixed() {
        OrientGraphFactory factory = new OrientGraphFactory("memory:test");
        try (OrientGraph graph = factory.getNoTx()) {
            // Create a new vertex class, called Node
            graph.executeSql("CREATE CLASS Node EXTENDS V");
            // Node has two properties, a integer id and a string tag
            graph.executeSql("CREATE PROPERTY Node.id INTEGER");
            graph.executeSql("CREATE PROPERTY Node.tag STRING");
            // Create an index by id field
            graph.executeSql("CREATE INDEX node_id_index IF NOT EXISTS ON Node (id) UNIQUE");

            // Create a new edge class, called Link
            graph.executeSql("CREATE CLASS Link EXTENDS E");

            graph.executeSql(
                    "INSERT INTO Node (id, tag) VALUES (1, 'inner'), (2, 'inner'), (3, 'outer')");

            // Link Node 1 -> Node 2
            graph.executeSql(
                    "CREATE EDGE Link "
                            + "FROM (SELECT FROM Node.node_id_index WHERE id=1)"
                            + " TO (SELECT FROM Node.node_id_index WHERE id=2)");
            // Link Node 2 -> Node 3
            graph.executeSql(
                    "CREATE EDGE Link "
                            + "FROM (SELECT FROM Node.node_id_index WHERE id=2)"
                            + " TO (SELECT FROM Node.node_id_index WHERE id=3)");

            String innerQuery =
                    "TRAVERSE OUT('Link') FROM (SELECT FROM Node.node_id_index WHERE id = ?) WHILE tag = ?";

            // Traverse all Nodes with "inner" tag value starting with 1
            List<String> innerNodes = runTraversalQuery(graph, innerQuery, 1, "inner");
            Assert.assertEquals("We have exactly two inner nodes", 2, innerNodes.size());

            String outerQueryOriginal =
                    "TRAVERSE OUT('Link') FROM ("
                            + innerQuery
                            + ") WHILE $depth <= 1 STRATEGY BREADTH_FIRST";
            List<String> outerQueryOriginalNodes =
                    runTraversalQuery(graph, outerQueryOriginal, 1, "inner");
            Assert.assertEquals(
                    "We should see all 3 nodes here but due to OrientDB bug we see only 2",
                    2,
                    outerQueryOriginalNodes.size());

            String outerQueryWorkaround =
                    "TRAVERSE OUT('Link') FROM ("
                            + "SELECT FROM Node.node_id_index WHERE id IN ("
                            + "SELECT id FROM ("
                            + innerQuery
                            + "))"
                            + ") WHILE $depth <= 1 STRATEGY BREADTH_FIRST";
            List<String> outerQueryWorkaroundNodes =
                    runTraversalQuery(graph, outerQueryWorkaround, 1, "inner");
            Assert.assertEquals(
                    "We should see all 3 nodes here and we see all 3",
                    3,
                    outerQueryWorkaroundNodes.size());
        }
    }
}
luigidellaquila commented 5 years ago

Hi @azakkerman

I'm not sure I got the point here (I didn't run the test yet, to be honest), you are executing this query

            String outerQueryOriginal =
                    "TRAVERSE OUT('Link') FROM ("
                            + innerQuery
                            + ") WHILE $depth <= 1 STRATEGY BREADTH_FIRST";

and you say that you expect three nodes in the result, but this is not the expected result: the third node has $depth = 2, so it will be discarded.

Am I missing something?

Thanks

Luigi

azakkerman commented 5 years ago

Hi, Luigi,

And thank you for looking into it.

Anatoly.

luigidellaquila commented 5 years ago

Hi @azakkerman

Ok, I got the point and I think I know where the problem is. Historically, the TRAVERSE statement does not re-traverse already traversed records, so the second time it reaches the node 2 it discards it, so it does not proceed to traverse node 3.

A possible solution is to replace the TRAVERSE with a MATCH with while condition (that does not discard duplicates), but you have to do some refactoring of the queries

Thanks

Luigi

azakkerman commented 5 years ago

Hi, @luigidellaquila,

These semantics don’t seem to be stable. It would imply that depending on the order in which the outer traversal runs on the FROM nodes, it will produce different results. That seems to violate basic principles of SQL. If outer traversal has a WHILE clause, it should not be pruning potential nodes the first time a node is visited and doesn't match the WHILE clause, since there are possible other paths to that node that do match the WHILE clause.

The workaround also seems to illustrate that if the nodes returned from the inner traversal are simply re-selected and fed into the outer traversal, then the outer traversal works as expected, not pruning the search prematurely? Thank you, Anatoly.