Open rovalcor opened 4 years ago
Hmmm. I don't see anything that you're doing as incorrect. I'm quite busy at the moment with a project that's just going into production so apologies for slow replies. Can you recreate the error in a failing test?
@zachaysan It's OK, thank you for looking into this. I'm also busy at the time, but I'll get to replicating the test, hopefully later during the day.
Hey @rovalcor were you ever able to figure this out? I'm preparing to update the codebase to the latest releases and it would be good to handle this at the same time.
Hi there! I'm finding myself in a similar issue to the one reported here - let me know if this is better handled in a separate issue.
Using Ruby 2.7.2, Rails 6.0.3.4, Postgres 13.1, activerecord-hierarchical_query 1.2.3.
My schema is a little different, where I have a Relationships, using polymorphic associations to create a link between "Source -> Destination : with description", where both Source and Destination can be a number of types, including a self-link.
Schema:
create_table "relationships", force: :cascade do |t|
t.string "relater_type", null: false
t.bigint "relater_id", null: false
t.string "relatable_type", null: false
t.bigint "relatable_id", null: false
t.string "description"
t.index ["relatable_type", "relatable_id"], name: "index_relationships_on_relatable_type_and_relatable_id"
t.index ["relater_type", "relater_id"], name: "index_relationships_on_relater_type_and_relater_id"
end
The query:
Relationship.join_recursive do
start_with(relater_id: id)
.connect_by(relater_id: id)
.nocycle
.distinct
With a simple scaffolding of 4 relationships that includes a cycle, I expect to get a response of 4 rows, but currently get 12.
Here's the SQL generated:
SELECT "relationships".*
FROM "relationships"
INNER JOIN
(WITH RECURSIVE "relationships__recursive" AS
(SELECT "relationships"."id",
"relationships"."relater_id", ARRAY["relationships"."id"] AS __path
FROM "relationships"
WHERE "relationships"."relater_id" = 30
UNION ALL SELECT "relationships"."id",
"relationships"."relater_id",
"relationships__recursive"."__path"||"relationships"."id"
FROM "relationships"
INNER JOIN "relationships__recursive" ON "relationships__recursive"."relater_id" = 30
WHERE NOT ("relationships"."id" = ANY("relationships__recursive"."__path")) ) SELECT DISTINCT "relationships__recursive".*
FROM "relationships__recursive") AS "relationships__recursive" ON "relationships"."id" = "relationships__recursive"."id"
I'm able to get the rows that I expect by applying a .uniq
on the result, but that means that the DB is doing more work and returning more data over the wire, making Ruby/Rails do more work too. It also "breaks" the class from a ActiveRecord::HierarchicalQuery::Query
to be an Array
.
Right now with 4/12 it's pretty inconsequential, but I could see this being more problematic as the project scales.
I also tried removing the .distinct
call so see what SQL was generated, in case that makes a big difference:
SELECT "relationships".*
FROM "relationships"
INNER JOIN
(WITH RECURSIVE "relationships__recursive" AS
(SELECT "relationships"."id",
"relationships"."relater_id", ARRAY["relationships"."id"] AS __path
FROM "relationships"
WHERE "relationships"."relater_id" = 30
UNION ALL SELECT "relationships"."id",
"relationships"."relater_id",
"relationships__recursive"."__path"||"relationships"."id"
FROM "relationships"
INNER JOIN "relationships__recursive" ON "relationships__recursive"."relater_id" = 30
WHERE NOT ("relationships"."id" = ANY("relationships__recursive"."__path")) ) SELECT "relationships__recursive".*
FROM "relationships__recursive") AS "relationships__recursive" ON "relationships"."id" = "relationships__recursive"."id"
and all that does is change the order in which the 12 records are returned. I don't really care about the oder yet, which is why I am not using order_siblings
, but found it interesting that the using .distinct
changes the order, but not the count.
If there's anything I can do to help identify/resolve, please let me know!
Hey @miketheman so sorry for the delay, I'm scrambling to get a couple contracts done before the Christmas holidays. Once about Dec 17 hits I'll be able to devote the time to fix this issue.
The very best way you can speed this is up is to open a PR with a failing test that's sufficiently commented. That way I'll easily be able to step through the library code to figure out what's going on. It's been a while since I've had to fix a bug in this codebase so it will take me a bit to re-orientate myself, but I should be able to figure it out and fix it.
I'm trying to traverse a table with circular references. The table looks like this:
There are
umbrella_relationships
record pairs where the foreign keys permutate (i.e. for a record with{ umbrella_id: 1, underlying_id: 2 }
, it'd be usual to see another record with{ umbrella_id: 2, underlying_id: 1 }
)To retrieve
umbrella_relationships
that "stack" above a givenunderlying_id
, I wrote the following:However, for an example that expects to get 13 records back, this returns a 13310-row result set, with many duplications of the expected 13 records. My first guess is that the query would iterate multiple times through all relationships, using the
nocycle
method to detect where to stop, but, unless I'm not understanding something correctly, the call todistinct
should apply to CTEs, so I fail to see why am I receiving so many records. This scales to be a major problem when expecting longer stacks, where I keep running into OOM errors on the RDBMS's end. Could you please advise on how to minimize memory usage on a query like this?I have tested this in Postgres 9.4, 9.6 and 12.2, in case it helps. Thanks!