neo4jrb / activegraph

An active model wrapper for the Neo4j Graph Database for Ruby.
http://neo4jrb.io
MIT License
1.4k stars 276 forks source link

`skip` parameter shouldn't vary by the number skipped #928

Open cheerfulstoic opened 9 years ago

cheerfulstoic commented 9 years ago

If you have pagination of 50 then the parameter shouldn't be skip_50 because the query doesn't get cached

subvertallchris commented 9 years ago

Could there be a situation where a valid query uses skip twice, maybe with a with somewhere in the middle? If so, you'd need to be careful that it doesn't reuse the parameter.

cheerfulstoic commented 9 years ago

Yeah, I expect that there would be ;) So this might need to be an indexed thing (skip_0, skip_1, etc...)

levidamian commented 9 years ago

I switched pagination from Kaminari to will_paginate. No significant difference in response time. The first fetch of 20 authors takes 17 seconds or so. I used the database for a while to have the caches warmed up. Total number of authors: 4.1 millions. Both properties uuid and author_name are uniquely indexed.

Started GET "/authors" for 127.0.0.1 at 2015-08-20 10:30:28 -0400
Processing by AuthorsController#index as HTML
 CYPHER 6780ms MATCH (result_author:`Author`) RETURN count(result_author) AS result_author 
 CYPHER 9946ms MATCH (result_author:`Author`) RETURN result_author ORDER BY result_author.author_name SKIP {skip_0} LIMIT {limit_20} | {:skip_0=>0, :limit_20=>20}
  Rendered home/_main_links.html.erb (0.3ms)
  Rendered authors/index.html.erb within layouts/application (5.3ms)
Completed 200 OK in 17100ms (Views: 370.4ms)
cheerfulstoic commented 9 years ago

Yeah, I've had problems with pagination myself in my database which has 8 million rows for a label. It comes down to Neo4j, I think. I don't know if there's a way to solve this from the gem. @jexp seemed to have some thoughts on why there might be a way to improve. Also there are hopefully some changes coming down the pipeline (don't know when) to improve this

levidamian commented 9 years ago

The first cypher query about the count takes 6.6 seconds to be executed by the database. Maybe an improvement here is to do the count using the unique index created on UUID. I am not an expert here, I am thinking based on my previous experience with relational databases where a statement like this: select count(1) from

; goes very fast if column 1 in the table is uniquely indexed. The second query is taking 10 seconds to complete. I don't think the problem is at the pagination level. I think is how the database engine executes the query. That query can arrive from pagination, or another gem or can be simply developed by a developer. Generating from the pagination gem another query, to be executed faster by the database engine, may be another option here. Not sure either what the best solution can be.

jexp commented 9 years ago

It is planned to have index supported ordering at some point, but it's tricky if you include arbitrary paths and expressions

Von meinem iPhone gesendet

Am 20.08.2015 um 17:04 schrieb levidamian notifications@github.com:

The first cypher query about the count takes 6.6 seconds to be executed by the database. Maybe an improvement here is to do the count using the unique index created on UUID. I am not an expert here, I am thinking based on my previous experience with relational databases where a statement like this: select count(1) from

; goes very fast if column 1 in the table is uniquely indexed. The second query is taking 10 seconds to complete. I don't think the problem is at the pagination level. I think is how the database engine executes the query. That query can arrive from pagination, or another gem or can be simply developed by a developer. Generating from the pagination gem another query, to be executed faster by the database engine, may be another option here. Not sure either what the best solution can be. — Reply to this email directly or view it on GitHub.

levidamian commented 9 years ago

For comparison here is the time required by a Postgres database to perform those two queries against the same number of authors. The queries are picked from the log and generated by the will_paginate gem. The Postgres database has the same data like the Neo4j database because was used as a staging database to cleanse and normalize the data to be loaded into the Graph database. I set the start at the offset equal to 4 millions in order to "skip more". The Postgres database is hosted by the same AWS server hosting the Neo4j database, but is usually stopped when the Neo4j database is running.

woka=# select count(1) from unique_authors;

count

4136810 (1 row) Time: 401.399 ms

woka=# select * from unique_authors LIMIT 20 OFFSET 4000000; id | name
--------+---------------------------------------------------- 678533 | Champcommunal, Joseph 678534 | Champe 678535 | Champeau, Albert 678536 | Champeau, Donna 678537 | Champeau, Edmond 678538 | Champeau, Jean-Louis-Dominique 678539 | Champeau-L-D 678540 | Champeau, Le R P 678541 | Champeau, Louis 678542 | Champeau, Louis Dominique 678543 | Champeau, Louis-Dominique 678544 | Champeau, Padre 678545 | Champeau, R P 678546 | Champeau, Serge 678547 | Champeaux, A De 678548 | Champeaux, Alfred De 678549 | Champeaux, Claude 678550 | Champeaux, Dennis De | Faure, Penelope | Lea, Doug 678551 | Champeaux-E 678552 | Champeaux, Ernest (20 rows) Time: 218.084 ms

levidamian commented 9 years ago

Same response time (0.2 seconds or so) from the Postgres database if I am setting the offset to 0, 10, or 1,000, or 10,000.

jexp commented 9 years ago

It is not so much the skipping that's expensive than the ordering.

I just tried it with 1M entries: with a skip of 0 it takes 7ms with a skip of 950.000 it takes 260ms

which is not superfast but also not outlandish, but that's also an operation Neo4j is not optimized (yet) for.

neo4j-sh (?)$ match (p:Person) return p skip 950000 limit 25;
+--------------------------------------------+
| p                                          |
+--------------------------------------------+
| Node[950305]{id:950001,name:"name 950001"} |
| Node[950306]{id:950002,name:"name 950002"} |
| Node[950307]{id:950003,name:"name 950003"} |
| Node[950308]{id:950004,name:"name 950004"} |
| Node[950309]{id:950005,name:"name 950005"} |
| Node[950310]{id:950006,name:"name 950006"} |
| Node[950311]{id:950007,name:"name 950007"} |
| Node[950312]{id:950008,name:"name 950008"} |
| Node[950313]{id:950009,name:"name 950009"} |
| Node[950314]{id:950010,name:"name 950010"} |
| Node[950315]{id:950011,name:"name 950011"} |
| Node[950316]{id:950012,name:"name 950012"} |
| Node[950317]{id:950013,name:"name 950013"} |
| Node[950318]{id:950014,name:"name 950014"} |
| Node[950319]{id:950015,name:"name 950015"} |
| Node[950320]{id:950016,name:"name 950016"} |
| Node[950321]{id:950017,name:"name 950017"} |
| Node[950322]{id:950018,name:"name 950018"} |
| Node[950323]{id:950019,name:"name 950019"} |
| Node[950324]{id:950020,name:"name 950020"} |
| Node[950325]{id:950021,name:"name 950021"} |
| Node[950326]{id:950022,name:"name 950022"} |
| Node[950327]{id:950023,name:"name 950023"} |
| Node[950328]{id:950024,name:"name 950024"} |
| Node[950329]{id:950025,name:"name 950025"} |
+--------------------------------------------+
25 rows
265 ms
neo4j-sh (?)$ match (p:Person) return p skip 0 limit 25;     
+---------------------------------+
| p                               |
+---------------------------------+
| Node[305]{id:1,name:"name 1"}   |
| Node[306]{id:2,name:"name 2"}   |
| Node[307]{id:3,name:"name 3"}   |
| Node[308]{id:4,name:"name 4"}   |
| Node[309]{id:5,name:"name 5"}   |
| Node[310]{id:6,name:"name 6"}   |
| Node[311]{id:7,name:"name 7"}   |
| Node[312]{id:8,name:"name 8"}   |
| Node[313]{id:9,name:"name 9"}   |
| Node[314]{id:10,name:"name 10"} |
| Node[315]{id:11,name:"name 11"} |
| Node[316]{id:12,name:"name 12"} |
| Node[317]{id:13,name:"name 13"} |
| Node[318]{id:14,name:"name 14"} |
| Node[319]{id:15,name:"name 15"} |
| Node[320]{id:16,name:"name 16"} |
| Node[321]{id:17,name:"name 17"} |
| Node[322]{id:18,name:"name 18"} |
| Node[323]{id:19,name:"name 19"} |
| Node[324]{id:20,name:"name 20"} |
| Node[325]{id:21,name:"name 21"} |
| Node[326]{id:22,name:"name 22"} |
| Node[327]{id:23,name:"name 23"} |
| Node[328]{id:24,name:"name 24"} |
| Node[329]{id:25,name:"name 25"} |
+---------------------------------+
25 rows
6 ms
cheerfulstoic commented 9 years ago

I just thought of something that a colleague said to me a while back:

Pagination isn't the best user experience. The chance that your users are going to want the first 20 items sorted by ID (or whatever) is small. What you want is to have search/filter functionality so that users can go right to where they want to go.

It should be pretty fast to filter without an ORDER and see how many results you're going to have. If it's below a reasonable number (determined by benchmarking) then you could show a pagination interface because Neo4j is ordering on a smaller set of results.

As for what to show when the page first loads you have various options. You could show a blank page with your search/filter UI. You could show a cached set of results. If pagination is important to you, you could cache the first few pages of each set of results that you might have.

A lot depends on what you're comfortable with, but hopefully this is good general advice. And hopefully we'll see some performance improvements from Neo4j around simple ordering soon.

cheerfulstoic commented 9 years ago

Oh! And if you want to implement search, one good way to do it is to use elasticsearch with the searchkick gem. It indexes a second database which is searchable very fast (and supports full text / fuzzy search). If you decide to go that way be aware that to index your whole model it uses find_in_batches, which ActiveNode supports, but which uses ORDER BY/LIMIT/SKIP, so it might not be the fastest thing. I'd suggest at least a batch size of 1000 (I think that's the default) which for 4.1 million nodes means 4,100 batches. If the performance that you're seeing is an indication 4100 batches * 17s would take 19.3 hours. Not great, but not horrible either.

When I indexed by 8 million nodes I "batched" them by finding another node that a smaller group of nodes could be linked to uniquely (in my case it was residents in geographical sections and each batch of residents would be from a geographical section). Since searchkick doesn't let you index a whole model without using find_in_batches straight up, I put together a hacky solution which lets me do it with the custom batch solution. Let me know if you're interested and I can share it.

levidamian commented 9 years ago

OK, I removed the ORDER clause and know paginating through 4.1 million authors is taking half of the time, somewhere between: 6 and 9 seconds, instead of 17 to 19 seconds. Below are the details: Started GET "/authors" for 127.0.0.1 at 2015-08-21 10:43:18 -0400 Processing by AuthorsController#index as HTML CYPHER 5802ms MATCH (n:Author) RETURN count(n) AS n CYPHER 429ms MATCH (n:Author) RETURN n SKIP {skip_0} LIMIT {limit_20} | {:skip_0=>0, :limit_20=>20} Rendered home/_main_links.html.erb (0.3ms) Rendered authors/index.html.erb within layouts/application (6.7ms) Completed 200 OK in 8573ms (Views: 414.8ms)

Started GET "/authors?page=2&per_page=20" for 127.0.0.1 at 2015-08-21 10:43:56 -0400 Processing by AuthorsController#index as HTML Parameters: {"page"=>"2", "per_page"=>"20"} CYPHER 6080ms MATCH (n:Author) RETURN count(n) AS n CYPHER 290ms MATCH (n:Author) RETURN n SKIP {skip_20} LIMIT {limit_20} | {:skip_20=>20, :limit_20=>20} Rendered home/_main_links.html.erb (0.3ms) Rendered authors/index.html.erb within layouts/application (5.7ms) Completed 200 OK in 6757ms (Views: 382.9ms)

Started GET "/authors?page=9&per_page=20" for 127.0.0.1 at 2015-08-21 10:44:07 -0400 Processing by AuthorsController#index as HTML Parameters: {"page"=>"9", "per_page"=>"20"} CYPHER 6191ms MATCH (n:Author) RETURN count(n) AS n CYPHER 296ms MATCH (n:Author) RETURN n SKIP {skip_160} LIMIT {limit_20} | {:skip_160=>160, :limit_20=>20} Rendered home/_main_links.html.erb (0.3ms) Rendered authors/index.html.erb within layouts/application (6.3ms) Completed 200 OK in 6886ms (Views: 395.1ms)

Started GET "/authors?page=206835&per_page=20" for 127.0.0.1 at 2015-08-21 10:45:02 -0400 Processing by AuthorsController#index as HTML Parameters: {"page"=>"206835", "per_page"=>"20"} CYPHER 6381ms MATCH (n:Author) RETURN count(n) AS n CYPHER 2485ms MATCH (n:Author) RETURN n SKIP {skip_4136680} LIMIT {limit_20} | {:skip_4136680=>4136680, :limit_20=>20} Rendered home/_main_links.html.erb (0.3ms) Rendered authors/index.html.erb within layouts/application (5.6ms) Completed 200 OK in 9254ms (Views: 384.8ms)

Started GET "/authors?page=206836&per_page=20" for 127.0.0.1 at 2015-08-21 10:45:14 -0400 Processing by AuthorsController#index as HTML Parameters: {"page"=>"206836", "per_page"=>"20"} CYPHER 6092ms MATCH (n:Author) RETURN count(n) AS n CYPHER 2354ms MATCH (n:Author) RETURN n SKIP {skip_4136700} LIMIT {limit_20} | {:skip_4136700=>4136700, :limit_20=>20} Rendered home/_main_links.html.erb (0.3ms) Rendered authors/index.html.erb within layouts/application (4.9ms) Completed 200 OK in 8844ms (Views: 395.6ms)

Started GET "/authors?page=206830&per_page=20" for 127.0.0.1 at 2015-08-21 10:45:26 -0400 Processing by AuthorsController#index as HTML Parameters: {"page"=>"206830", "per_page"=>"20"} CYPHER 6004ms MATCH (n:Author) RETURN count(n) AS n CYPHER 2326ms MATCH (n:Author) RETURN n SKIP {skip_4136580} LIMIT {limit_20} | {:skip_4136580=>4136580, :limit_20=>20} Rendered home/_main_links.html.erb (0.3ms) Rendered authors/index.html.erb within layouts/application (6.5ms) Completed 200 OK in 8745ms (Views: 411.8ms)

Started GET "/authors?page=206826&per_page=20" for 127.0.0.1 at 2015-08-21 10:45:35 -0400 Processing by AuthorsController#index as HTML Parameters: {"page"=>"206826", "per_page"=>"20"} CYPHER 6534ms MATCH (n:Author) RETURN count(n) AS n CYPHER 2391ms MATCH (n:Author) RETURN n SKIP {skip_4136500} LIMIT {limit_20} | {:skip_4136500=>4136500, :limit_20=>20} Rendered home/_main_links.html.erb (0.3ms) Rendered authors/index.html.erb within layouts/application (6.1ms) Completed 200 OK in 9340ms (Views: 411.0ms)

subvertallchris commented 9 years ago

The most expensive part of that process is still count. It seems silly to do that again and again, since the number of records probably isn't changing so quickly that it needs to be rerun on each request. If you could maintain a cache of that total, we could modify the gem to accept that number as an argument instead of constantly recounting. I wouldn't be surprised if someone already wrote an unmanaged extension to do this and if they haven't, I bet it wouldn't be too hard to write.

levidamian commented 9 years ago

True, other technologies already did so. For example when using will_paginate with Postgres, the count is not recalculated each time when the pages are changed / navigated. My large counts like those 4.1 million authors and 19 million counts are nor changing often. For other cases when couple reviews are written for a particular book, the count is changing but I noticed that anything bellow couple thousand is counted very fast anyway. I can count all large numbers when application starts and provide them as params when required. But we need also I think to have the original method preserved in the gem for small counts.

levidamian commented 9 years ago

I can also remove .order when no param search is used because my large data volumes loaded in the database are coming from CSV files which were alphabetically sorted by Postgres at the extraction time. If the gem will be modified as described above I will use it, rather than using elastic search.

levidamian commented 9 years ago

Also we should not forget the mane issue here: skip to 10, or 10,0000, or 100,000, or 1,000,000 needs to be happen in a relatively constant time.

jexp commented 9 years ago

I think anything after page 3 aka skip 75 doesn't make sense anyhow from a ux experience

Von meinem iPhone gesendet

Am 21.08.2015 um 18:29 schrieb levidamian notifications@github.com:

Also we should not forget the mane issue here: skip to 10, or 10,0000, or 100,000, or 1,000,000 needs to be happen in a relatively constant time.

— Reply to this email directly or view it on GitHub.

subvertallchris commented 9 years ago

I don't think that the performance when skipping large numbers of records isn't something the gem will be able to help with.

I'll need to think a bit about how to modify the Paginated class to build results with an external total. The existing create_from class method is doing too much as is, I can't add another parameter, so it will need a different approach. Where that total comes from, right now, won't be a part of this, so you might want to think about how you can maintain that within your app for now, @levidamian.

levidamian commented 9 years ago

I can pass the count any time I am calling the method. My data is completely static in this area: books and authors, 4.1 millions nodes and 19 millions on nodes. My only methods in the controller are :index and :show. Any addition to the data or changes are done from small CSV files in batches. However in other area the data will be changed by the users, i.e. reviews or comments about the books. But in this area I am not expecting more than couple tens of new nodes per book, as comments and reviews. And I am not seeing (in time) more then 2 million books reviewed and commented out 19 millions.

For sure the user will be looking for a particular author and / or book. So I will need also the index with :search parameter to work properly. Wha concerns me is when the use enters a part of the name and I will need to use .text. Not sure how this will work since I am aware that for the moment regex is supporting only searches like this: text.*

On Aug 22, 2015, at 3:10 PM, Chris Grigg notifications@github.com wrote:

The performance when skipping large numbers of records isn't something the gem will be able to help with.

I'll need to think a bit about how to modify the Paginated class to build results with an external total. The existing create_from class method is doing too much as is, I can't add another parameter, so it will need a different approach. Where that total comes from, right now, won't be a part of this, so you might want to think about how you can maintain that within your app for now, @levidamian https://github.com/levidamian.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-133740771.

jexp commented 9 years ago

Regexp supports all patterns What you prob want is LIKE search in 2.3 with index Or FTS with lucene indexes

Von meinem iPhone gesendet

Am 23.08.2015 um 14:16 schrieb levidamian notifications@github.com:

I can pass the count any time I am calling the method. My data is completely static in this area: books and authors, 4.1 millions nodes and 19 millions on nodes. My only methods in the controller are :index and :show. Any addition to the data or changes are done from small CSV files in batches. However in other area the data will be changed by the users, i.e. reviews or comments about the books. But in this area I am not expecting more than couple tens of new nodes per book, as comments and reviews. And I am not seeing (in time) more then 2 million books reviewed and commented out 19 millions.

For sure the user will be looking for a particular author and / or book. So I will need also the index with :search parameter to work properly. Wha concerns me is when the use enters a part of the name and I will need to use .text. Not sure how this will work since I am aware that for the moment regex is supporting only searches like this: text.*

On Aug 22, 2015, at 3:10 PM, Chris Grigg notifications@github.com wrote:

The performance when skipping large numbers of records isn't something the gem will be able to help with.

I'll need to think a bit about how to modify the Paginated class to build results with an external total. The existing create_from class method is doing too much as is, I can't add another parameter, so it will need a different approach. Where that total comes from, right now, won't be a part of this, so you might want to think about how you can maintain that within your app for now, @levidamian https://github.com/levidamian.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-133740771.

— Reply to this email directly or view it on GitHub.

levidamian commented 9 years ago

I may need a little bit of help with these. Not sure if I am able to use them properly.

On Aug 23, 2015, at 11:43 AM, Michael Hunger notifications@github.com wrote:

Regexp supports all patterns What you prob want is LIKE search in 2.3 with index Or FTS with lucene indexes

Von meinem iPhone gesendet

Am 23.08.2015 um 14:16 schrieb levidamian notifications@github.com:

I can pass the count any time I am calling the method. My data is completely static in this area: books and authors, 4.1 millions nodes and 19 millions on nodes. My only methods in the controller are :index and :show. Any addition to the data or changes are done from small CSV files in batches. However in other area the data will be changed by the users, i.e. reviews or comments about the books. But in this area I am not expecting more than couple tens of new nodes per book, as comments and reviews. And I am not seeing (in time) more then 2 million books reviewed and commented out 19 millions.

For sure the user will be looking for a particular author and / or book. So I will need also the index with :search parameter to work properly. Wha concerns me is when the use enters a part of the name and I will need to use .text. Not sure how this will work since I am aware that for the moment regex is supporting only searches like this: text.*

On Aug 22, 2015, at 3:10 PM, Chris Grigg notifications@github.com wrote:

The performance when skipping large numbers of records isn't something the gem will be able to help with.

I'll need to think a bit about how to modify the Paginated class to build results with an external total. The existing create_from class method is doing too much as is, I can't add another parameter, so it will need a different approach. Where that total comes from, right now, won't be a part of this, so you might want to think about how you can maintain that within your app for now, @levidamian https://github.com/levidamian.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-133740771.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-133868457.

levidamian commented 9 years ago

However storing the count and not recalculating each and every time will be only half of the solution. That problem about skip taking longer the more pages fetched at the end are needs also to be solved. If for the first pages the timing is ok, for the last pages the timing increases to another order of magnitude.

CYPHER 366ms MATCH (n:Author) RETURN n SKIP {skip_0} LIMIT {limit_20} | {:skip_0=>0, :limit_20=>20} CYPHER 361ms MATCH (n:Author) RETURN n SKIP {skip_20} LIMIT {limit_20} | {:skip_20=>20, :limit_20=>20} CYPHER 548ms MATCH (n:Author) RETURN n SKIP {skip_40} LIMIT {limit_20} | {:skip_40=>40, :limit_20=>20} ... CYPHER 2458ms MATCH (n:Author) RETURN n SKIP {skip_4136700} LIMIT {limit_20} | {:skip_4136700=>4136700, :limit_20=>20} CYPHER 2384ms MATCH (n:Author) RETURN n SKIP {skip_4136680} LIMIT {limit_20} | {:skip_4136680=>4136680, :limit_20=>20}

cheerfulstoic commented 9 years ago

Do you expect users to click through 206,835 pages? And even if your pagination gives them the ability to click on those last couple of pages, what value do they get from viewing them? With millions of records people don't just browse pages, they do searches.

levidamian commented 9 years ago

I am expecting to have better or similar response times when implementing basic REST methods with Neo4j in comparison with the case when using a Postgres database loaded with exactly the same data volumes and using the same gems: will_paginate or Kaminari.

On Aug 24, 2015, at 1:28 PM, Brian Underwood notifications@github.com wrote:

Do you expect users to click through 206,835 pages? And even if your pagination gives them the ability to click on those last couple of pages, what value do they get from viewing them? With millions of records people don't just browse pages, they do searches.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134308381.

cheerfulstoic commented 9 years ago

You should change your expectations then. They are different databases with different strengths and weaknesses. It seems like in the near-ish future Neo4j might have some of it's weaknesses shored up, but for now you have what you have

levidamian commented 9 years ago

Too bad I learned this after 8 months spent with this technology :)

On Aug 24, 2015, at 2:35 PM, Brian Underwood notifications@github.com wrote:

You should change your expectations then. They are different databases with different strengths and weaknesses. It seems like in the near-ish future Neo4j might have some of it's weaknesses shored up, but for now you have what you have

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134331584.

cheerfulstoic commented 9 years ago

Sorry, that's how it goes sometimes.

There are a number of workarounds that we've discussed, but I want to clarify that even if I were using Postgres (and don't get me wrong, I love Postgres for what it is) I wouldn't recommend pagination as a UI. Especially for millions of records. It's not a good user experience.

subvertallchris commented 9 years ago

I'm closing this issue. You seem to know how to fix this -- it is a "basic REST implantation," after all -- so please open an issue at https://github.com/neo4j/neo4j and instruct them. I'm sure they'll be eager to apply your fix.

cheerfulstoic commented 9 years ago

Actually, the crazy thing is that this whole conversation was tangental to the original issue that I opened. We still need to fix the {skip} parameter issue ;)

subvertallchris commented 9 years ago

Ha! Sorry, I'm sure you can see why I lost track of that... ;-)

On Monday, August 24, 2015, Brian Underwood notifications@github.com wrote:

Actually, the crazy thing is that this whole conversation was tangental to the original issue that I opened. We still need to fix the {skip} parameter issue ;)

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134438930.

levidamian commented 9 years ago

Let me be blunt guys.

If your product cant’t do decent REST for large data sets used in BI solutions it will matter very little what else your product will do better than other products. Did you forget what you guys claimed in your (white) papers that increasing the volume of the database will increase the processing time with only a logarithmic increase of the delta? What a company like Facebook or others having tens of million of nodes and relationships will do if attempting to use your database? Change their operation modus, telling the users what to do and what not to do?

Here is my advise to you: fix the database and the related gems and stop patronizing the users of them.

Greta, you are cc-ed in this email because I think that “Houston we have a problem”. You may want also other senior people in your organization let know about this problem.

http://stackoverflow.com/questions/32098900/neo4j-rb-very-slow-pagination-with-kaminari/32208834#32208834 http://stackoverflow.com/questions/32098900/neo4j-rb-very-slow-pagination-with-kaminari/32208834#32208834 https://github.com/neo4jrb/neo4j/issues/928 https://github.com/neo4jrb/neo4j/issues/928

Best, Levi Damian.

On Aug 24, 2015, at 9:33 PM, Chris Grigg notifications@github.com wrote:

Ha! Sorry, I guess I lost track of where we started. ;-)

On Monday, August 24, 2015, Brian Underwood notifications@github.com wrote:

Actually, the crazy thing is that this whole conversation was tangental to the original issue that I opened. We still need to fix the {skip} parameter issue ;)

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134438930.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134439065.

subvertallchris commented 9 years ago

You seem confused. We are not the AUTHORS of Neo4j. That would be https://github.com/neo4j/neo4j. We are the maintainers of the RUBY GEM that provides ActiveRecord-like integration between Ruby classes and the database. As a user of Neo4j, I'd love for big sort and skip operations to be faster. Michael already indicated that this is on the roadmap. I'm extremely unclear about what you expect this gem to do to help you.

subvertallchris commented 9 years ago

You can open an issue with Neo4j at https://github.com/neo4j/neo4j/issues/.

subvertallchris commented 9 years ago

And for that matter, I am unaware of Neo4j claiming any performance benefits when returning records that are NOT part of a relational match. Performance benefits come from the linked nature of relational data. "Get 4 million records with label N and skip to page Y" is not a graph traversal as MATCH (n:Label1)-[r:TYPE1]->(n2)-[r2:TYPE2]->(n3)<-[r3:TYPE3]-(n4) RETURN n4 is. Neo4j is optimized for graph traversals. It's not worth arguing at this point, though, so let's please move on.

levidamian commented 9 years ago

Here is the solutions I suggested earlier, no matter who is the author or the maintainer of what:

1.Change the queries generated by the gems to be executed fasted and more efficiently.

  1. Change the database engine to execute faster retains queries.
  2. Change both if necessary.If this required to open an issue with Neo4j, please be my guest and open one, the queries are not written by me, are generated by your gem and is not my responsibility to optimize or tune them.

I am expecting for the same volume of data a basic Neo4j query to return earlier or in the same time a Postgres db using half of the memory of the Neo4j db is returning. What queries are generated by default by REST operations and how are executed is not my concern while these are performing acceptable.

On Aug 25, 2015, at 12:37 PM, Chris Grigg notifications@github.com wrote:

You can open an issue with Neo4j at https://github.com/neo4j/neo4j/issues/ https://github.com/neo4j/neo4j/issues/.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134665048.

levidamian commented 9 years ago

Fair enough to having me and my company to move on to use something else instead of Neo4j.

On Aug 25, 2015, at 12:48 PM, Chris Grigg notifications@github.com wrote:

And for that matter, I am unaware of Neo4j claiming any performance benefits when returning records that are NOT part of a relational. Performance benefits come from the linked nature of relational data. "Get 4 million records with label N and skip to page Y" is not a graph traversal as MATCH (n:Label1)-[r:TYPE1]->(n2)-[r2:TYPE2]->(n3)<-[r3:TYPE3]-(n4) RETURN n4 is. Neo4j is optimized for graph traversals. It's not worth arguing at this point, though, so let's please move on.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134667632.

subvertallchris commented 9 years ago
  1. The only query that can be changed to operate more efficiently is the count, which we discussed changing and I can patch if you can provide the total.
  2. That's not something we have the power to do.
  3. We cannot be responsible for opening issues for you. We could have directed you there in the first place instead of offering suggestions to work around the issue. I'm sorry that you find that offensive, but we volunteer a lot of our time to help users and have never had someone find feedback so objectionable.
cheerfulstoic commented 9 years ago

To make this absolutely clear because I'm not sure if you saw it:

We have are not Neo Technology (the creators of Neo4j). We are volunteers maintaining the neo4j.rb Ruby gems because we happen to really like Neo4j. We spend our free time working on this gem. We enjoy responding to issues to the best of our ability and fixing issues as quickly as we can given our schedules. Anything that Chris or I have said we offer as personal advice and it should not be taken as coming from Neo Technology as a company.

We've tried to offer suggestions and advice on what we would do in your shoes. If those aren't sufficient I'm sorry that we couldn't be of more help. If you contact Neo Technology through their official channels they may have other suggestions.

As far as we know as outsides to the product development of Neo4j this is an issue that the company is aware of and interested in fixing. It sounds like various remedies are on the roadmap for Neo4j.

jexp commented 9 years ago

I just want to say thank you to @subvertallchris and @cheerfulstoic for spending your time maintaining this great gem. Without you many Ruby developers would have way more effort of using Neo4j.

@levidamian If you need exactly this use-case then you should use a database that is built for this use-case. Neo4j has many applications with connected data where it excels because it was built for that. At some point we will cover more use-cases, but right now we focus our capacity on the things that benefit the majority of our users and customers. Sorry if that's not enough for you.

levidamian commented 9 years ago

My problem is that I need exactly this use-case plus the Neo4j features. Thank you for solving my dilemma.

On Aug 25, 2015, at 2:51 PM, Michael Hunger notifications@github.com wrote:

I just want to say thank you to @subvertallchris https://github.com/subvertallchris and @cheerfulstoic https://github.com/cheerfulstoic for spending your time maintaining this great gem. Without you many Ruby developers would have way more effort of using Neo4j.

@levidamian https://github.com/levidamian If you need exactly this use-case then you should use a database that is built for this use-case. Neo4j has many applications with connected data where it excels because it was built for that. At some point we will cover more use-cases, but right now we focus our capacity on the things that benefit the majority of our users and customers. Sorry if that's not enough for you.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134699928.

subvertallchris commented 9 years ago

If your index isn't changing, can you cache the views? You could write a task to rebuild the cache as needed so users aren't inconvenienced.

On Tuesday, August 25, 2015, levidamian notifications@github.com wrote:

My problem is that I need exactly this use-case plus the Neo4j features. Thank you for solving my dilemma.

On Aug 25, 2015, at 2:51 PM, Michael Hunger <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I just want to say thank you to @subvertallchris < https://github.com/subvertallchris> and @cheerfulstoic < https://github.com/cheerfulstoic> for spending your time maintaining this great gem. Without you many Ruby developers would have way more effort of using Neo4j.

@levidamian https://github.com/levidamian If you need exactly this use-case then you should use a database that is built for this use-case. Neo4j has many applications with connected data where it excels because it was built for that. At some point we will cover more use-cases, but right now we focus our capacity on the things that benefit the majority of our users and customers. Sorry if that's not enough for you.

— Reply to this email directly or view it on GitHub < https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134699928>.

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-134704672.

levidamian commented 9 years ago

Here is an attempt to use a parameter to lower the response time. No improvement, the total response time still over 25 seconds. 12.3 seconds to get the count (without param this was around 6 seconds) and the rest of the time, another 12.5 seconds to do the fetch of 20 nodes.

Started GET "/authors" for 127.0.0.1 at 2015-08-26 13:03:37 -0400 Processing by AuthorsController#index as HTML CYPHER 12396ms MATCH (result_author:Author) WHERE (result_author.author_name =~ {result_author_author_name}) RETURN count(result_author) AS result_author | {:result_author_author_name=>"(?i).camus."} CYPHER 12520ms MATCH (result_author:Author) WHERE (result_author.author_name =~ {result_author_author_name}) RETURN result_author ORDER BY result_author.author_name SKIP {skip_0} LIMIT {limit_20} | {:result_author_author_name=>"(?i).camus.", :skip_0=>0, :limit_20=>20} Rendered home/_main_links.html.erb (0.3ms) Rendered authors/index.html.erb within layouts/application (5.3ms) Completed 200 OK in 25312ms (Views: 392.1ms)

cheerfulstoic commented 9 years ago

I don't think anybody suggesting that using a parameter would fix the issue... This issue is about fixing the way that the {skip_#} parameter is formed so that queries perform a little bit better. It wasn't intended to fix your issue, it was just a note that I made about a small but important performance improvement and the other discussion just ended up happening here...

levidamian commented 9 years ago

Correct, this attempt wan’r about fixing the “skip” issue. It was a suggestion about how try an work around those long response time. Like that one removing the order clause and that one suggesting to cache the count. I am just trying them one by one.

On Aug 26, 2015, at 2:46 PM, Brian Underwood notifications@github.com wrote:

I don't think anybody suggesting that using a parameter would fix the issue... This issue is about fixing the way that the {skip_#} parameter is formed so that queries perform a little bit better. It wasn't intended to fix your issue, it was just a note that I made about a small but important performance improvement and the other discussion just ended up happening here...

— Reply to this email directly or view it on GitHub https://github.com/neo4jrb/neo4j/issues/928#issuecomment-135136009.

jexp commented 9 years ago

Best ask these questions on SO, not here :)