neo4jrb / activegraph

An active model wrapper for the Neo4j Graph Database for Ruby.
http://neo4jrb.io
MIT License
1.4k stars 276 forks source link

Why does method with_associations change order? #1495

Closed kopylovvlad closed 6 years ago

kopylovvlad commented 6 years ago

When I using method .with_associations, it changes order! O_o Why does this happen?

Code example (inline, gist, or repo)

Example 1:

# just take first 20 items and remember ids
arr1 = Post.all.limit(20).map(&:id)
#Post 
#  MATCH (n:`Post`)
#  RETURN n
#  LIMIT {limit_20} | {:limit_20=>20}
# HTTP REQUEST: 109ms POST http://localhost:7575/db/data/transaction/commit (1 bytes)
# => [21213, 21214, 21215, 21216, 21217, 25988, 25989, 25990, 25991, 25992, 25993, 25994, 25995, 25996, 25997, 25998, 28088, 28089, 28090, 28091] 

# and takes first 20 items with method .with_associations
arr2 = Post.all.with_associations(:post_type, :post_class, :post_subject).limit(20).map(&:id)
#Post 
#  MATCH (n:`Post`)
#  WITH n
#  LIMIT {limit_20}
#  OPTIONAL MATCH (n)-[`post_type_rel`:`post_type`]->(`post_type`)
#  WHERE (`post_type`:`PostType`)
#  WITH 
#    n, 
#    [collect(`post_type_rel`), collect(`post_type`)] AS `post_type_collection`
#  OPTIONAL MATCH (n)-[`post_class_rel`:`post_class`]->(`post_class`)
#  WHERE (`post_class`:`PostClass`)
#  WITH 
#    n, 
#    [collect(`post_class_rel`), collect(`post_class`)] AS `post_class_collection`, 
#    `post_type_collection` AS `post_type_collection`
#  OPTIONAL MATCH (n)-[`post_subject_rel`:`post_subject`]->(`post_subject`)
#  WHERE (`post_subject`:`PostSubject`)
#  WITH 
#    n, 
#    [collect(`post_subject_rel`), collect(`post_subject`)] AS `post_subject_collection`, 
#    `post_type_collection` AS `post_type_collection`,`post_class_collection` AS #`post_class_collection`
#  RETURN 
#    n, 
#    [`post_type_collection`,`post_class_collection`,`post_subject_collection`] | {:limit_20=>20}
# HTTP REQUEST: 72ms POST http://localhost:7575/db/data/transaction/commit (1 bytes)
# => [25990, 25994, 25998, 21214, 25991, 21217, 25996, 25992, 25988, 25995, 28091, 28088, 28089, 21216, 21213, 28090, 21215, 25993, 25989, 25997] 

# arrays are not equal ?
arr1 == arr2 
# false

Example 2:

# I have an array of ids
ids = [114, 182, 154, 74, 10, 304, 321]
# And want to collect items

Comment.find(ids).map(&:id) == ids
# Comment 
#  MATCH (n:`Comment`)
#  WHERE (ID(n) IN {ID_n})
#  RETURN n | {:ID_n=>[114, 182, 154, 74, 10, 304, 321]}
# HTTP REQUEST: 289ms POST http://localhost:7575/db/data/transaction/commit (1 bytes)
true 
# order the same

Comment.all.find(ids).map(&:id) == ids
# Comment 
#  MATCH (n:`Comment`)
#  WHERE (ID(n) IN {ID_n})
#  RETURN n | {:ID_n=>[114, 182, 154, 74, 10, 304, 321]}
# HTTP REQUEST: 9ms POST http://localhost:7575/db/data/transaction/commit (1 bytes)
true 
# order the same again

Comment.all.with_associations(:author).find(ids).map(&:id) == ids
# Comment 
#  MATCH (n:`Comment`)
#  WHERE (ID(n) IN {ID_n})
#  WITH n
#  OPTIONAL MATCH (n)-[`author_rel`:`author`]->(`author`)
#  WHERE (`author`:`User`)
#  WITH 
#    n, 
#    [collect(`author_rel`), collect(`author`)] AS `author_collection`
#  RETURN 
#    n, 
#    [`author_collection`] | {:ID_n=>[114, 182, 154, 74, 10, 304, 321]}
# HTTP REQUEST: 435ms POST http://localhost:7575/db/data/transaction/commit (1 bytes)
false 
# but now, order is different

Example 3

# try to get comments with order
arr1 = Comment.all.order(created_at: :asc).limit(20).map(&:id)
# Comment 
#  MATCH (n:`Comment`)
#  RETURN n
#  ORDER BY n.created_at ASC
#  LIMIT {limit_20} | {:limit_20=>20}
# HTTP REQUEST: 19ms POST http://localhost:7575/db/data/transaction/commit (1 bytes)
#[1508, 1378, 1385, 4053, 3042, 2467, 3872, 1418, 2459, 1954, 3412, 2625, 1450, 2774, 843, 3815, 2787, 2795, 2595, 1362] 

arr2 = Comment.all.with_associations(:post, :author).order(created_at: :asc).limit(20).map(&:id)
# Comment 
#  MATCH (n:`Comment`)
#  WITH n
#  LIMIT {limit_20}
#  OPTIONAL MATCH (n)-[`post_rel`:`post`]->(`post`)
#  WHERE (`post`:`Post`)
#  WITH 
#    n, 
#    [collect(`post_rel`), collect(`post`)] AS `post_collection`
#  OPTIONAL MATCH (n)-[`author_rel`:`author`]->(`author`)
#  WHERE (`author`:`User`)
#  WITH 
#    n, 
#    [collect(`author_rel`), collect(`author`)] AS `author_collection`, 
#    `post_collection` AS `post_collection`
#  ORDER BY n.created_at ASC
#  RETURN 
#    n, 
#    [`post_collection`,`author_collection`] | {:limit_20=>20}
# HTTP REQUEST: 16ms POST http://localhost:7575/db/data/transaction/commit (1 bytes)
#[74, 110, 243, 10, 154, 287, 272, 283, 291, 350, 114, 293, 382, 386, 375, 321, 304, 371, 390, 182] 

arr1 == arr2
#false 

Runtime information:

Neo4j database version: neo4j gem version: neo4j (9.2.1) neo4j-core gem version: 8.1.3

cheerfulstoic commented 6 years ago

Oh, gosh, this is a good catch.

When you don't specify order, I think that is up to Neo4j on how things are returned. And if you are making MATCHes (like with_associations does for you), then it could affect the order as it's looking for a subgraph pattern.

But when you do specify the order, I actually see the problem. The limit, for some reason is getting applied at the beginning of the query. That means that those first 20 are getting retrieved without respect for the order and then the ORDER BY is applied later, after it's been through a couple of WITH clauses

I don't have time to look at it right now, but if you'd like to take a stab I think the code is around here:

https://github.com/neo4jrb/neo4j/blob/1852714c573be96a049e62164ffaa4abc5efcae2/lib/neo4j/active_node/query/query_proxy_eager_loading.rb#L147

@klobuczek Might be able to help / give advice on this as he did a lot of refactoring on that file.

kopylovvlad commented 6 years ago

Hello. I did some research with cypher and found out that position of "ORDER BY" is very important for query result.

For example, I wrote 7 different queries and combinated different position for "ORDER BY". Here are queries:

# v1
MATCH (n:Comment)
  RETURN n
  ORDER BY n.created_at ASC

# v2
MATCH (n:Comment)
  WITH n
  OPTIONAL MATCH (n)-[post_rel:post]->(post)
  WHERE (post:Post)
  WITH
    n,
    [collect(post_rel), collect(post)] AS post_collection
  OPTIONAL MATCH (n)-[author_rel:author]->(author)
  WHERE (author:User)
  WITH
    n,
    [collect(author_rel), collect(author)] AS author_collection,
      post_collection AS post_collection
  ORDER BY n.created_at ASC
  RETURN n

# v3
MATCH (n:Comment)
  WITH n
  ORDER BY n.created_at ASC
  OPTIONAL MATCH (n)-[post_rel:post]->(post)
  WHERE (post:Post)
  WITH
    n,
    [collect(post_rel), collect(post)] AS post_collection
  OPTIONAL MATCH (n)-[author_rel:author]->(author)
  WHERE (author:User)
  WITH
    n,
    [collect(author_rel), collect(author)] AS author_collection,
    post_collection AS post_collection
  RETURN
    n

# v4
MATCH (n:Comment)
  WITH n
  ORDER BY n.created_at ASC
  RETURN n

# v5
MATCH (n:Comment)
  WITH n
  ORDER BY n.created_at ASC
  OPTIONAL MATCH (n)-[post_rel:post]->(post)
  WHERE (post:Post)
  WITH
    n,
    [collect(post_rel), collect(post)] AS post_collection
  OPTIONAL MATCH (n)-[author_rel:author]->(author)
  WHERE (author:User)
  WITH
    n,
    [collect(author_rel), collect(author)] AS author_collection,
    post_collection AS post_collection
  RETURN
    n, [post_collection,author_collection]

# v6
MATCH (n:Comment)
  WITH n
  OPTIONAL MATCH (n)-[post_rel:post]->(post)
  WHERE (post:Post)
  WITH
    n,
    [collect(post_rel), collect(post)] AS post_collection
  OPTIONAL MATCH (n)-[author_rel:author]->(author)
  WHERE (author:User)
  WITH
    n,
    [collect(author_rel), collect(author)] AS author_collection,
    post_collection AS post_collection
  ORDER BY n.created_at ASC
  RETURN
    n

# v7
MATCH (n:Comment)
  WITH n
  OPTIONAL MATCH (n)-[post_rel:post]->(post)
  WHERE (post:Post)
  WITH
    n,
    [collect(post_rel), collect(post)] AS post_collection
  OPTIONAL MATCH (n)-[author_rel:author]->(author)
  WHERE (author:User)
  WITH
    n,
    [collect(author_rel), collect(author)] AS author_collection,
    post_collection AS post_collection
  ORDER BY n.created_at ASC
  RETURN
    n, [post_collection,author_collection]

And this is a link to report on google docs: https://docs.google.com/spreadsheets/d/1UUgvmexi6hTxRjkp6gFElggidecZtSGPFb_xrOLQ9Zc/edit?usp=sharing Shortly: queries 3 and 5 have wrong order.

And I did mistake with verions of 'neo4j' gem. I have two projects with neo4j. In first project I use '9.0.7' and have bug with order. In second project, I use '9.2.1' and don't have bug.

klobuczek commented 6 years ago

Hello @kopylovvlad, could you confirm if you still see an ordering problem on the latest version (master)? If so could you please create a PR with a failing spec and I will fix the issue. Thanks for bringing this up.

cheerfulstoic commented 6 years ago

@kopylovvlad That seems potentially right. If the ORDER BY is followed by a (OPTIONAL) MATCH clause, I could see Neo4j not preserving the order.

@klobuczek I would guess that this problem is happening in master. If he's using 9.2.1 noted in the initial comment, it is not different in ways that I would expect to affect this. I think it's mainly Rubocop stuff:

https://github.com/neo4jrb/neo4j/compare/v9.2.1...master

klobuczek commented 6 years ago

@cheerfulstoic I am bit confused, it appears to me that @kopylovvlad is saying in his last comment that 9.2.1 does NOT have the ordering bug. That's why I wanted a confirmation. On my end after my last PR to QueryProxyEagerLoading the order clause is always placed after all clauses generated by with_associations regardless of the position of order in the query chain.

cheerfulstoic commented 6 years ago

Hrmm, I'm not sure ;) . He closed the issue, though, so maybe everything OK

mperice commented 5 years ago

@klobuczek Actually, I just ran into this bug ... it still affects 9.2.4 and also 9.3.0. It's actually very simple to reproduce ... use .order+.limit combined with .with_association. As @cheerfulstoic pointed out ORDER BY is for some reason applied last, even after LIMIT and OPTIONAL MATCHE-es:

But when you do specify the order, I actually see the problem. The limit, for some reason is getting applied at the beginning of the query. That means that those first 20 are getting retrieved without respect for the order and then the ORDER BY is applied later, after it's been through a couple of WITH clauses

A simple example:

irb(main):001:0> >> Events::Event.order(feed_timestamp: :desc).limit(1)
 CYPHER 
  MATCH (result_eventsevent:`Event`)
  RETURN result_eventsevent
  ORDER BY result_eventsevent.feed_timestamp DESC
  LIMIT {limit_1} | {:limit_1=>1}
=> #<QueryProxy  [#<Events::LaunchesEvent uuid: "32427119-9e5c-4fe0-9d31-75f812f0fab2", feed_timestamp: Thu, 31 Dec 2099 00:00:00 +0100]>

irb(main):002:0> >> Events::Event.order(feed_timestamp: :desc).limit(1).with_associations(:primary_company)
 CYPHER 
  MATCH (result_eventsevent:`Event`)
  WITH result_eventsevent
  LIMIT {limit_1}
  OPTIONAL MATCH (result_eventsevent)<-[`primary_company_rel`]-(`primary_company`)
  WHERE (`primary_company`:`Company`)
  WITH 
    result_eventsevent, 
    [collect(`primary_company_rel`), collect(`primary_company`)] AS `primary_company_collection`
  ORDER BY result_eventsevent.feed_timestamp DESC
  RETURN 
    result_eventsevent, 
    [`primary_company_collection`] | {:limit_1=>1}
=> #<QueryProxy  [#<Events::AcquiresEvent uuid: "a0849f9a-9559-44fa-97b4-1f6ad78d1d54", feed_timestamp: Mon, 09 Oct 2017 07:03:38 +0200]>

Consequently, the results of the two queries differ.