spring-projects / spring-data-neo4j

Provide support to increase developer productivity in Java when using Neo4j. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
http://spring.io/projects/spring-data-neo4j
Apache License 2.0
834 stars 617 forks source link

[Performance Issue] Generic relationship does not honor type when query #2941

Open abccbaandy opened 3 months ago

abccbaandy commented 3 months ago

I have some node

@Data
@Node
public abstract class BaseNode {
    @Id
    @GeneratedValue
    private UUID id;
}

@EqualsAndHashCode(callSuper = true)
@Node
@Data
public class Child extends BaseNode{
}

@EqualsAndHashCode(callSuper = true)
@Node
@Data
@NoArgsConstructor
@AllArgsConstructor
@Builder
@ToString(callSuper = true)
public class Parent1 extends BaseNode{
    @Relationship(type = "Parent1_CONTAIN", direction = Relationship.Direction.OUTGOING)
    private List<BaseRelationship<BaseNode>> parent1Relationships;
}

@EqualsAndHashCode(callSuper = true)
@Node
@Data
@NoArgsConstructor
@ToString(callSuper = true)
public class Parent2 extends BaseNode {
    @Relationship(type = "Parent2_CONTAIN", direction = Relationship.Direction.OUTGOING)
    private List<BaseRelationship<BaseNode>> parent2Relationships;
}

When I get parent1 with find all:

        List<Parent1> all = parent1Repository.findAll();

The log shows

MATCH (parent1:`Parent1`:`BaseNode`) WITH collect(elementId(parent1)) AS __sn__ RETURN __sn__
MATCH (parent1:`Parent1`:`BaseNode`) OPTIONAL MATCH (parent1)-[__sr__:`Parent1_CONTAIN`]->(__srn__:`BaseNode`) WITH collect(elementId(parent1)) AS __sn__, collect(elementId(__srn__)) AS __srn__, collect(elementId(__sr__)) AS __sr__ RETURN __sn__, __srn__, __sr__
MATCH (baseNode:`BaseNode`) WHERE elementId(baseNode) IN $__ids__ OPTIONAL MATCH (baseNode)-[__sr__:`Parent2_CONTAIN`]->(__srn__:`BaseNode`) WITH collect(elementId(baseNode)) AS __sn__, collect(elementId(__srn__)) AS __srn__, collect(elementId(__sr__)) AS __sr__ RETURN __sn__, __srn__, __sr__
MATCH (baseNode:`BaseNode`) WHERE elementId(baseNode) IN $__ids__ OPTIONAL MATCH (baseNode)-[__sr__:`Parent1_CONTAIN`]->(__srn__:`BaseNode`) WITH collect(elementId(baseNode)) AS __sn__, collect(elementId(__srn__)) AS __srn__, collect(elementId(__sr__)) AS __sr__ RETURN __sn__, __srn__, __sr__
MATCH (rootNodeIds:`Parent1`) WHERE elementId(rootNodeIds) IN $rootNodeIds WITH collect(rootNodeIds) AS n OPTIONAL MATCH ()-[relationshipIds]-() WHERE elementId(relationshipIds) IN $relationshipIds WITH n, collect(DISTINCT relationshipIds) AS __sr__ OPTIONAL MATCH (relatedNodeIds) WHERE elementId(relatedNodeIds) IN $relatedNodeIds WITH n, __sr__ AS __sr__, collect(DISTINCT relatedNodeIds) AS __srn__ UNWIND n AS rootNodeIds WITH rootNodeIds AS parent1, __sr__, __srn__ RETURN parent1 AS __sn__, __sr__, __srn__

In the query log, shows it search for Parent2_CONTAIN but it shouldn't, because Parent2_CONTAIN is not in Parent1 node. In real case, if I have 10 node extends the base node, it will end up query all 10 node's relationship, I think it is a performance issue.

Change to Child still have this issue

    @Relationship(type = "Parent1_CONTAIN", direction = Relationship.Direction.OUTGOING)
    private List<BaseRelationship<Child>> parent1Relationships;

Also, I think this issue is associate https://github.com/spring-projects/spring-data-neo4j/issues/2933

meistermeier commented 3 months ago

The person1Repository will query for :Parent1_CONTAIN nodes defined by parent1Relationships in Parent1. Those relationships are of type BaseNode. The next iteration will than query for :Parent2_CONTAIN and :Parent1_CONTAIN defined in Parent2 and Parent1 as the implementations of BaseNode. This behaviour is expected.

Looking at the other issue and the Child typed relationship, the same behaviour applies because the type definition is derived from the class public class BaseDbRelationship<T extends BaseNode> itself which is again "just" a BaseNode.

abccbaandy commented 3 months ago

The person1Repository will query for :Parent1_CONTAIN nodes defined by parent1Relationships in Parent1. Those relationships are of type BaseNode. The next iteration will than query for :Parent2_CONTAIN and :Parent1_CONTAIN defined in Parent2 and Parent1 as the implementations of BaseNode. This behaviour is expected.

Looking at the other issue and the Child typed relationship, the same behaviour applies because the type definition is derived from the class public class BaseDbRelationship<T extends BaseNode> itself which is again "just" a BaseNode.

I can guess this behavior, but I think the design is wrong. When I query parent1, SDN should search for Parent1_CONTAIN relationship only.

But I think the root cause is SDN treat the BaseNode as a Node. In my domain, I don't really use the BaseNode, it just a base class for share common field and polymorphism. In real case, I have many childes with same relationship type:

(:Parent1)-[r:Parent1_CONTAIN]->(:Child1)
(:Parent1)-[r:Parent1_CONTAIN]->(:Child2)

So, I have to use a base class to get all of them.

    @Relationship(type = "Parent1_CONTAIN", direction = Relationship.Direction.OUTGOING)
    private List<BaseRelationship<BaseNode>> parent1Relationships;

@meistermeier Is there any other way to achieve this requirement without base class issue?

abccbaandy commented 6 days ago

Any update? @michael-simons