pietermartin / sqlg

TinkerPop graph over sql
MIT License
245 stars 51 forks source link

Lazy traversal and query execution clashes #125

Closed pietermartin closed 3 years ago

pietermartin commented 7 years ago

Sqlg executes many queries per traversal. It executes the queries lazily. If while looping the traversal elements are inserted or modified then a subsequent query might return the inserted or modified element. This breaks the traversal scope semantics.

TestVertexEdges.testBothEOnEdgeToSelf reveals this error. I have for now hacked SqlgVertex.edges to eagerly iterate the traversal but its not good enough.

Either loose laziness or execute all queries at once and then lazily iterate the results.

JPMoresmau commented 7 years ago

Are there tests showing the proper behavior in Tinkerpop? I'm afraid removing the laziness will cause memory usage to increase a lot. It's nearly as if the side effects should be done in a different transaction, but of course as soon as the traversal is complete they should be visible to a subsequent traversal in the same thread/transaction.

JPMoresmau commented 7 years ago

FYI just to confirm the two tests in TestVertexEdges pass with TinkerGraph-Gremlin, so you're right about the current behavior breaking the semantics.

pietermartin commented 7 years ago

Regarding doing it in a different transaction breaks transaction semantics. In particular the MVCC nature of it.

Currently the JDBC connection preloads all the data for a query. There is a property that can make the JDBC connection also lazy load. Perhaps if the lazy load parameter is set we execute the queries upfront and then lazily iterate it will resolve the issue. Needs investigation though.

JPMoresmau commented 7 years ago

Mhhh, not too sure what you mean, but anyway currently a JDBC level solution is not going to work. For example in one of the failing test, we have two edge types and so we have two different JDBC statements. I tried putting the result in a CachedRowSet but it didn't fix anything, all the results for the step have to be retrieved. So I removed the laziness in SqlgCompiledResultIterator and that solves the issue. I'll see what the performance impact on the full test suite is. Maybe if that's the fix we could control it via a flag? We could have a lazy/eager property on connection and even a runtime flag on the current transaction, so that most queries are still lazy and you can ask for eager mode when you want the stricter semantics.

pietermartin commented 7 years ago

Regarding getting the JDBC driver to lazy load the property is st.setFetchSize(50) https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor

So we could execute all queries for a gremlin upfront and still lazily iterate all of them.

I'll have a look at your PR later today.

pietermartin commented 7 years ago

I gave this some more thought and realized that executing the queries upfront won't work. There might be mid traversal VertexSteps or GraphSteps that can not be executed upfront.

This means we have to go with the eager load route as you have done.

For the lazy load flag, perhaps we should prevent any graph modifications if the lazy load flag is on? That way the system has consistent semantics.

pietermartin commented 7 years ago

A benefit of knowing that a transaction is read only is that we could then potentially execute queries in parallel then. It will require profiling to see if its worth it or not. For mid-traversal GraphSteps and VertexSteps that could potentially be 1000's of queries it could have a significant performance impact.

JPMoresmau commented 7 years ago

It's bit extreme, no? Basically we have to change the default to eager reads, which cause performance issues (memory usage on non trivial workloads). The fact that all tests so far have worked and that the issue wasn't reported earlier mean it's not too current an occurrence, which is why I set up the eager mode as the non default option. If you want to modify the same things you're traversing, then you set it.

pietermartin commented 7 years ago

Yeah, it bothers me though that the semantics changes and its the kind of change that causes very subtle bugs. I suppose seeing as we have the meta data build a set of labels/tables that may not be touched and throw an exception if they are touched during a traversal. It be interesting to see if the test suite still passes then. WDYT?

JPMoresmau commented 7 years ago

Are we always sure we know when we're out of a traversal?

pietermartin commented 7 years ago

The primary boundary will be transaction. For the same reason I needed to cache the jdbc statements on the transaction to ensure they get closed properly. The lazy semantics meant I had no guarantee that the ResultSet is fully iterated and closed.

Apart from that we will be able to know when the last hasNext() returns false.

So while transaction is open and hasNext() returned true, modifications are not allowed.

To complicate things, the first queries labels/tables do not need to be part of the 'may not touch' list of tables as its fully fetched already by jdbc.

Part of my concern is that TinkerPop are starting to discourage the use of Graph.addVertex preferring g.traversal().V().this().that().addV().this().that(). I have never used this and do not think in those terms but it probably increases the risk somewhat.

pietermartin commented 7 years ago

This one is still not quite right. The TestVertexEdges tests we added are failing on H2 and HSQLDB. The order in which they execute the bothE happens to be different to Postgresql and thus does not see the edge added in the forEachRemaining loop. I reckon there is actually nothing really wrong with the implementation but rather that the semantics of this scenario is open ended. I'll ask TinkerPop about it.

pietermartin commented 7 years ago

Ok asked them, looks like its a bit up in the air. The this.sqlgGraph.tx().setLazyQueries(lazy); will do.

pietermartin commented 7 years ago

Created https://issues.apache.org/jira/browse/TINKERPOP-1616 Reopening this one to not loose track of it.

pietermartin commented 3 years ago

This is not going to be resolved, its a feature now, not a bug.