Open jexp opened 8 months ago
@jexp Do you want us in Cypher surface to look into this? In that case could you provide us with a setup so we can reproduce?
you can test it on https://demo.neo4jlabs.com:7473/browser/ username/password/database is recommendations
The original idea was to have an easy way to select which properties to return from a node to a client UI while keeping the functionality that you have when returning regular nodes. This is meant to exclude heavy or irrelevant properties like large texts, embeddings, large number of properties and focus on the relevant properties for captions and querying so that graphs can be visualized without overloading the UI.
https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/148
So wrapping a node .i.e. keeping it's identity and then limiting which properties are available.
As I recently discovered the "keeping the identity" part was not implemented, so that the wrapped nodes are not tied to the original node or its relationships, which would have to be awkwardly reconstructed.
Performance comparison for the query below, which returns roughly 34 nodes and 37 rels is 10 seconds vs 10ms for the approach excluding the huge properties. It fetches two embedding and one large text property per node.
// 10s+
call db.index.fulltext.queryNodes("movieFulltext","Forrest Gump", {limit:1}) yield node as n, score as s1
call db.index.vector.queryNodes("moviePlotsEmbedding",5, n.plotEmbedding) yield node as movie, score
match (person:Person)-[r1]->(movie)-[r2:IN_GENRE]->(genre)
return *
Here is the example of how it should have worked:
// doesn't work, nodes are disconnected from relationships
call db.index.fulltext.queryNodes("movieFulltext","Forrest Gump", {limit:1}) yield node as n, score as s1
call db.index.vector.queryNodes("moviePlotsEmbedding",5, n.plotEmbedding) yield node as movie, score
match (person:Person)-[r1]->(movie)-[r2:IN_GENRE]->(genre)
return genre, r1, r2, apoc.create.virtual.fromNode(movie, ["title","released","url"]) as movie,
apoc.create.virtual.fromNode(person, ["name","born","url"]) as person
and what you actually have to do now to make it work
// 10ms
call db.index.fulltext.queryNodes("movieFulltext","Forrest Gump", {limit:1}) yield node as n, score as s1
call db.index.vector.queryNodes("moviePlotsEmbedding",5, n.plotEmbedding) yield node as movie, score
match path = (person:Person)-[rp]->(movie)-[rg:IN_GENRE]->(genre)
with collect(path) as paths
call apoc.graph.fromPaths(paths,"results",{}) yield graph
with graph.nodes as nodes, graph.relationships as rels
with rels, apoc.map.fromPairs([n in nodes | [coalesce(n.tmdbId, n.name), apoc.create.vNode(labels(n),n {.*, plotEmbedding:null, posterEmbedding:null, plot:null, bio:null })]]) as nodes
return nodes, [r in rels | apoc.create.vRelationship(nodes[coalesce(startNode(r).tmdbId,startNode(r).name)], type(r), properties(r), nodes[coalesce(endNode(r).tmdbId,endNode(r).name)])] as rels
@gem-neo4j did some digging for me and it seems like this was implemented with positive node ids earlier but changed because of a bug where changes to the virtual node were updating the real one: https://github.com/neo4j-contrib/neo4j-apoc-procedures/issues/2252
Yeah but that was intentional that it is a wrapper of the real node so all operations on it should pass through. Perhaps the naming with "virtual" was not that good it should have been "wrapped".
e.g.
apoc.create.virtual.fromNode(movie, ["title","released","url"])
returns virtual nodes with negative ids but it shouldn't in this case.
Seems like an implementation bug that somehow crept in.
Need it for excluding node embeddings and large text properties for returning in neo4j tools