neo4j / apoc

Apache License 2.0
95 stars 28 forks source link

Using virtual nodes in Cypher #536

Open rickardoberg opened 11 months ago

rickardoberg commented 11 months ago

Using virtual nodes in Cypher

It appears that using virtual nodes in Cypher queries is not possible or bugged. Basically, if a procedure/function returns an APOC VirtualNode (or any Node implementation for that matter) I would expect it to be possible to use it as any other node.

Neo4j Version: 5.12.0 Operating System: Windows 11 API: Cypher

Steps to reproduce

Run this Cypher query:

WITH apoc.create.vNode(['vnode'], {name:'one'}) AS one
RETURN one.name

Expected behavior

Should return "one"

Actual behavior

Returns null

The reason I need this to work is to be able to implement field-level access control, which is not supported by Neo4j itself.

InverseFalcon commented 11 months ago

This one isn't a bug, but a limitation in virtual node property access, since they aren't actually real nodes.

You can use the apoc.any.property() function to access the property of any object, including a virtual node or relationship. You can substitute usage of this in your RETURN clause to get the virtual property value.

...
RETURN apoc.any.property(one, 'name') as name

That said, I don't see this function or mention of it in the APOC virtual nodes documentation, and that really needs a mention there as well as a link to the function.

rickardoberg commented 11 months ago

It may not be a bug, but having "Map<String,Object>" being better supported than "Node" is a bit odd, wouldn't you say? Like.. why?

mnd999 commented 11 months ago

I'd be interested to here about what your needs are for field-level access control are as well. That's something that we're thinking about at the moment.

rickardoberg commented 11 months ago

It's for an HR analytics service. For each node and each field we need to calculate if the user making the query is allowed to access the value of that particular value, using the same rules as the HR system itself. The rules are in the graph as well, so in a sense it's back to the original reason Neo4j was created in the first place, but on a much more granular level.

In the end the model I ended up going with was to translate nodes into Map<String, Object> as the Cypher engine knows what to do with that. So when converting the node to that structure I can apply the rules, and then allow that data to be used for aggregations and output, behind a GraphQL engine which is what the user/UI uses to access the database.

We're also doing tenant separation of graphs by applying tenant id to all nodes as a label, and then using a custom AccessMode implementation which checks that each accessed node has the tenant label of the user, and with tenant aware compound indexes to make lookups fast.

All of this is with embedded Neo4j, using event sourcing projections to scale it.

On top of all the above each entity has full history, because we have all events with metadata of how state was changed, when, by who, and why, so we can also do time queries, as in, any query can be run with a timestamp of what state of the data should be used (includes both properties and relationships). A time series is simply running the same query many time with different timestamps, for example.

mnd999 commented 11 months ago

Thanks, I'm not sure if you're on community or enterprise but either way we didn't make the native RBAC available through public APIs for embedded so it's interesting to hear about a use case for it. I think initially what we're thinking about is comparisons with static values rather than dynamic lookups against the graph at least for now.

rickardoberg commented 11 months ago

This is on Community. I should make it clear that I have no desire to have the database implement any of this. It's all very application specific. What I want, essentially, is to have the database not get in my way and help me get it done on top of it.

In this case, specifically, if a procedure/function returns a Node it should be usable for the rest of the query. That would allow me to take a raw Node and put a wrapper with the extra logic on top.

Also, if a procedure/function returns a Map<String,Object> it would be nice if it wasn't immediately copied. Since Node's didn't work I was hoping to be able to return a custom Map<String,Object> implementation that could do the access check on get() but that also didn't work because of the immediate copy of all properties rather than just the used ones. Instead I now have to pre-calculate what fields are going to be used by the GraphQL query, create an access control checked Map<String, Object> with those fields, and then return that to be used by Cypher. Feels like an unnecessary hassle.

Then again I understand from other tickets that embedded is not a priority, at all, so I can see how it's perhaps a bit too niche.