Closed pascalwhoop closed 1 month ago
Hello, the problem is that the query you configure is interpolated into a larger query template before being run. SKIP
/ LIMIT
could have unintended consequences on the whole query or make it invalid (note that the validation is far from ideal as it may run into false positives quite easily but that's another story).
If you want to test a sample first, I'd advise to use the limit
method on the DataFrame.
That's a bit unfortunate since we'd have to load the entire DB into memory before then saying "give me only the first 10". Can you help me understand better why there is no way to pass a LIMIT
specifically to the DB?
@pascalwhoop you can pass it, but through the Dataframe limit
method. The limit is then pushed down to the Cypher layer as an optimization.
Ah I see because you do not de-construct the Query string back into an internal representation first but send it as-is to the n4j instance, now I got you!
This is a fallacy of my side then, Spark treats str
queries the same as pyspark
articulated queries, it always creates an internal representation first and then optimizes the entire query plan before executing. So as a spark developer, you then assume "I can articulate my query in any way" but there are differences here between the different APIs. Got ya!
https://github.com/neo4j-contrib/neo4j-spark-connector/blob/98c6b9d0687a0f5dbb2b5559270ed387ac744c71/common/src/main/scala/org/neo4j/spark/util/Validations.scala#L278
it would be helpful to be able to test a query with 5-10 records before running the full thing. any reasoning why this is not permitted? I couldn't find anything while searching your docs