Recursive eager loading with self association not populating references correctly

volatiletech / sqlboiler

Generate a Go ORM tailored to your database schema.

BSD 3-Clause "New" or "Revised" License

6.66k stars 539 forks source link

Recursive eager loading with self association not populating references correctly #457

Open lucaslsl opened 5 years ago

lucaslsl commented 5 years ago

If you're having a generation problem please answer these questions before submitting your issue. Thanks!

What version of SQLBoiler are you using (`sqlboiler --version`)?

SQLBoiler v3.1.0

If this happened at generation time what was the full SQLBoiler command you used to generate your models? (if not applicable leave blank)

If this happened at runtime what code produced the issue? (if not applicable leave blank)

entities.Delimiters(qm.Load("Delimiters.Delimiters.Delimiters")).All(r.Context(), DB)

What is the output of the command above with the `-d` flag added to it? (Provided you are comfortable sharing this, it contains a blueprint of your schema)

Please provide a relevant database schema so we can replicate your issue (Provided you are comfortable sharing this)

PostgreSQL 10.4

CREATE TABLE delimiters (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v1mc(),
    "name" VARCHAR(255) NOT NULL,    
    created_at TIMESTAMP NOT NULL DEFAULT now(),
    updated_at TIMESTAMP NOT NULL DEFAULT now()
);

CREATE TABLE delimitations (
    delimiter_id UUID NOT NULL REFERENCES delimiters (id),
    delimited_id UUID NOT NULL REFERENCES delimiters (id),
    PRIMARY KEY (delimiter_id, delimited_id)
);

Further information. What did you do, what did you expect?

delimiters
A
B
C
D
D1
D2

delimiter	delimited
A	B
B	C
C	D
C	D1
C	D2

When fetching D, D1, D2 and eager loading 3 levels ("Delimiters.Delimiters.Delimiters") one would expect for the references to be set for all Dx in all levels, instead, only the first row has all levels set:

start	C	B	A
D	[x]	[x]	[x]
D1	[x]	[ ]	[ ]
D2	[x]	[ ]	[ ]

D.R.Delimiters = [ C ]; C.R.Delimiters = [ B ]; B.R.Delimiters = [ A ] D1.R.Delimiters = [ C ]; C.R.Delimiters = [ ]; D2.R.Delimiters = [ C ]; C.R.Delimiters = [ ];

Note: This problem occurs only for this specific association (self association/same table). In cases where I have different associations, e.g. Student.Course.Campus.School, all associations are set for all records in all levels (instead of only the first row).

Is this behaviour normal?

aarondl commented 5 years ago

Quick question. Are there foreign keys to the delimiters table from delimitations ids? Just curious if there's a "true join table" in sqlboiler's eyes or not.

aarondl commented 5 years ago

Having a think on this it may be because there's a naive break somewhere in the association setting loop. Most of those loops look at foreign key of the object they have, look for an object with the same id, found it? Assign and stop looking.

This could be complicated by the non-true join table though. Maybe some bad assumptions somewhere.

lucaslsl commented 5 years ago

hey @aarondl, thanks for taking a look.

My tables are defined exactly as the snippet I posted above, following sqlboiler rule: your join table must have a composite primary key that encompasses both foreign table foreign keys and no other columns in the table. A Delimitation model is not generated by sqlboiler (as it's a join table in the generator's eyes, so it's skipped, leaving me with only delimiters.go).

The below is generated for delimiters

// delimiterR is where relationships are stored.
type delimiterR struct {
    DelimitedDelimiters  DelimiterSlice
    Delimiters           DelimiterSlice
}

I use these "true join tables" all around my application, the exception being that this is the only one that references the same table on both keys.

aarondl commented 5 years ago

I actually totally missed the references part in your table definitions sorry. I didn't see the word 'foreign key' but sort of glossed over everything else.

Some interesting developments here. I removed the break that I thought would cause it and sure enough it started doing what we wanted... except there's a new bug.

4:d (0, 1)
   3:c (1, 1)
     2:b (3, 3)
       1:a (3, 0)
       1:a (3, 0)
       1:a (3, 0)
5:d1 (0, 1)
   3:c (1, 1)
     2:b (3, 3)
       1:a (3, 0)
       1:a (3, 0)
       1:a (3, 0)
6:d2 (0, 1)
   3:c (1, 1)
     2:b (3, 3)
       1:a (3, 0)
       1:a (3, 0)
       1:a (3, 0)

Notice the duplication of the 'a' entries. I think some deduplication optimization code has hosed us in a way here.

It used to be if you did a query like this it would do something similar to this:

select * from table where id in (1, 1, 1, 1)

Instead what we do now is de-dup the 1 so that the query is more efficient and we only bring back as much data as we need. This however produces a single object (B in this case) (great! efficiency) that then gets set on all the C. So &d.c.b = &d1.c.b = &d2.c.b which means when we append the a to each of them, it actually adds to all of them simultaneously for the cute effect above.

This is horrible in a number of ways. Unsure on a fix currently. Need to think more.

ghost commented 5 years ago

These sort of references to references and one times better solution a cqrs layer on top. Otherwise you end up boiling the ocean imho.

aarondl commented 5 years ago

This is still a problem, and I'm still unsure of the best way to fix it.

I think my only proposal would be pretty gross: Use context to be able to turn on a refcounter that will duplicate memory as needed to ensure that these references don't get changed on the same object. That way we keep the nice optimization for simple cases, but have a workaround for this terrible case.

e-nikolov commented 4 years ago

I think I get a similar behaviour in my case.

I have two tables - action and action_version which have a one-to-many relationship. I start from a list of versions and want to fetch all sibling versions for each version in the list via qm.Load(), .e.g. []action_version -> []action -> [][]action_version.

In the result, the versions I start from are listed twice for the version. Do you think this might be related?

Some example code:

v, err := orm.ActionVersions(
    qm.Load(qm.Rels(orm.ActionVersionRels.Action, orm.ActionRels.ActionVersions)),
    orm.ActionVersionWhere.ID.EQ(149),
).All(ctx, tx)
if err != nil {
    return err
}
for _, v := range v {
    for _, av := range v.R.Action.R.ActionVersions {
        fmt.Println(av.Version)
    }
    fmt.Println("----------------------")
}

The result is as follows:

v0.0.1
v0.0.1
v0.0.2
v0.0.3
----------------------

aarondl commented 4 years ago

@e-nikolov It's likely to be yes. I wonder if you could post the same example with --debug on so we can see the queries performed?

vladvelici commented 2 years ago

What you think of this approach? Sorry for the inline patch, I'm having trouble pushing to GitHub (ISP issues).

diff --git a/templates/main/09_relationship_to_many_eager.go.tpl b/templates/main/09_relationship_to_many_eager.go.tpl
index 0fcf036..0d065f0 100644
--- a/templates/main/09_relationship_to_many_eager.go.tpl
+++ b/templates/main/09_relationship_to_many_eager.go.tpl
@@ -23,29 +23,33 @@ func ({{$ltable.DownSingular}}L) Load{{$relAlias.Local}}({{if $.NoContext}}e boi
    }

    args := make([]interface{}, 0, 1)
+   argUpdateIdxIfMatch := make([][]int, 0, 1)
    if singular {
        if object.R == nil {
            object.R = &{{$ltable.DownSingular}}R{}
        }
        args = append(args, object.{{$col}})
+       argUpdateIdxIfMatch = append(argUpdateIdxIfMatch, []int{0})
    } else {
        Outer:
-       for _, obj := range slice {
+       for idx, obj := range slice {
            if obj.R == nil {
                obj.R = &{{$ltable.DownSingular}}R{}
            }

-           for _, a := range args {
+           for argsIdx, a := range args {
                {{if $usesPrimitives -}}
                if a == obj.{{$col}} {
                {{else -}}
                if queries.Equal(a, obj.{{$col}}) {
                {{end -}}
+                   argUpdateIdxIfMatch[argsIdx] = append(argUpdateIdxIfMatch[argsIdx], idx)
                    continue Outer
                }
            }

            args = append(args, obj.{{$col}})
+           argUpdateIdxIfMatch = append(argUpdateIdxIfMatch, []int{idx})
        }
    }

@@ -151,38 +155,44 @@ func ({{$ltable.DownSingular}}L) Load{{$relAlias.Local}}({{if $.NoContext}}e boi
    {{if .ToJoinTable -}}
    for i, foreign := range resultSlice {
        localJoinCol := localJoinCols[i]
-       for _, local := range slice {
+       for argsIdx, localID := range args {
            {{if $usesPrimitives -}}
-           if local.{{$col}} == localJoinCol {
+           if localID == localJoinCol {
            {{else -}}
-           if queries.Equal(local.{{$col}}, localJoinCol) {
+           if queries.Equal(localID, localJoinCol) {
            {{end -}}
-               local.R.{{$relAlias.Local}} = append(local.R.{{$relAlias.Local}}, foreign)
-               {{if not $.NoBackReferencing -}}
-               if foreign.R == nil {
-                   foreign.R = &{{$ftable.DownSingular}}R{}
+               for _, toUpdateIdx := range argUpdateIdxIfMatch[argsIdx] {
+                   local := slice[toUpdateIdx]
+                   local.R.{{$relAlias.Local}} = append(local.R.{{$relAlias.Local}}, foreign)
+                   {{if not $.NoBackReferencing -}}
+                   if foreign.R == nil {
+                       foreign.R = &{{$ftable.DownSingular}}R{}
+                   }
+                   foreign.R.{{$relAlias.Foreign}} = append(foreign.R.{{$relAlias.Foreign}}, local)
+                   {{end -}}
                }
-               foreign.R.{{$relAlias.Foreign}} = append(foreign.R.{{$relAlias.Foreign}}, local)
-               {{end -}}
                break
            }
        }
    }
    {{else -}}
    for _, foreign := range resultSlice {
-       for _, local := range slice {
+       for argsIdx, localID := range args {
            {{if $usesPrimitives -}}
-           if local.{{$col}} == foreign.{{$fcol}} {
+           if localID == foreign.{{$fcol}} {
            {{else -}}
-           if queries.Equal(local.{{$col}}, foreign.{{$fcol}}) {
+           if queries.Equal(localID, foreign.{{$fcol}}) {
            {{end -}}
-               local.R.{{$relAlias.Local}} = append(local.R.{{$relAlias.Local}}, foreign)
-               {{if not $.NoBackReferencing -}}
-               if foreign.R == nil {
-                   foreign.R = &{{$ftable.DownSingular}}R{}
+               for _, toUpdateIdx := range argUpdateIdxIfMatch[argsIdx] {
+                   local := slice[toUpdateIdx]
+                   local.R.{{$relAlias.Local}} = append(local.R.{{$relAlias.Local}}, foreign)
+                   {{if not $.NoBackReferencing -}}
+                   if foreign.R == nil {
+                       foreign.R = &{{$ftable.DownSingular}}R{}
+                   }
+                   foreign.R.{{$relAlias.Foreign}} = local
+                   {{end -}}
                }
-               foreign.R.{{$relAlias.Foreign}} = local
-               {{end -}}
                break
            }
        }

aarondl commented 2 years ago

@vladvelici can you describe the change a bit? What about this approach solves the problem at hand but also keeps the optimization where possible?

vladvelici commented 2 years ago

@aarondl to my understanding this is what the existing version does:

loops through all elements in slice and extracts unique IDs into args.
does a query for those IDs,
for row in each row in the query result:
- for element in each element in slice:
  - if element ID matches row ID, assign relations in R and continue with next row (therefore skipping duplicate IDs)

my proposal:

loops through all elements in slice and saves unique IDs into args and where they were found in argUpdateIdxIfMatch
- this is the same time complexity as original algorithm, but it does use more memory to store the index references
- example, if slice has elements with the following IDs: [1,2,1,2,3], then args would be [1,2,3] and argUpdateIdxIfMatch would be [[0,2], [1,3], [4]].
perform query to fetch IDs in args, like before
for row in each row in the query result:
- for idx, id := range args
  - if id match row ID then
  - update all elements in slice at indices found at argUpdateIdxIfMatch[idx] (e.g. first one is slice[argUpdateIdxIfMatch[idx][0]])

If I'm not wrong, this is about the same time complexity but it uses some extra memory for cross-referencing.

Edit: I'm assuming the optimisation you're talking about is the break on first match when looping through the query results and slice.

vincekieft commented 1 year ago

Is this issue solved? I am facing the same issue where the break causes further elements of the slice to not have populated relations.

volatiletech / sqlboiler

Recursive eager loading with self association not populating references correctly #457

What version of SQLBoiler are you using (sqlboiler --version)?

If this happened at generation time what was the full SQLBoiler command you used to generate your models? (if not applicable leave blank)

If this happened at runtime what code produced the issue? (if not applicable leave blank)

What is the output of the command above with the -d flag added to it? (Provided you are comfortable sharing this, it contains a blueprint of your schema)

Please provide a relevant database schema so we can replicate your issue (Provided you are comfortable sharing this)

Further information. What did you do, what did you expect?

What version of SQLBoiler are you using (`sqlboiler --version`)?

What is the output of the command above with the `-d` flag added to it? (Provided you are comfortable sharing this, it contains a blueprint of your schema)