rethinkdb / rethinkdb

The open-source database for the realtime web.
https://rethinkdb.com
Other
26.69k stars 1.86k forks source link

Count Queries Painfully Slow #1271

Closed dbettin closed 10 years ago

dbettin commented 11 years ago

I talked to @jdoliner about this issue on IRC yesterday. But, I thought it would be worth my time to create an issue regarding the problem.

I have a large table ~30,000 documents, each document is ~150K and when I run a straightforward count query

r.table('mytable').count().run(conn)

it either takes a long time to complete or errors out.

Evidence:

http://snapnote.io/0o4zZT http://snapnote.io/dw3pRh

In my mind this a major issue and should be addressed before RethinkDB is production ready. Thoughts?

dbettin commented 11 years ago

It is no longer "painfully slow", it just doesn't work anymore. It returns the error, mentioned previously, after every query execution. Note: my document count is now around 60,000 for this table.

Let me know if you need further info.

srh commented 11 years ago

It seems like you might be seeing the same thing as in #1064 (which is badly titled).

srh commented 11 years ago

I'm currently looking into #1064. I will include your kind of workload into the testing. It's plausible that .count() could take a long time in general, because it's a naive O(n) function, but it should not fail with such a small table.

dbettin commented 11 years ago

Ok thanks. This is running on a Digital Ocean 32GB/320GB SSD VM

Also, It is my understanding the naivety of the count query will be addressed in the near term too, is that a correct understanding?

srh commented 11 years ago

I'm not sure. An efficient count involving a range query that uses the index or primary key, or a count of the entire table, is feasible. It's been something we've meant to do for a while (at least it would improve automatic resharding). This is tracked in issue #152.

coffeemug commented 11 years ago

I'd like to get better performance for count in, but I'm not sure of the scope yet (i.e. we can obviously do it on the full table, but what can we do for more sophisticated queries). We'll see what we can do here as part of 1.9/2.0 planning.

dbettin commented 11 years ago

Thanks @coffeemug. This is a pressing issue for us. It would be great to see this resolved in 1.9.

Also, note, our counts are not that sophisticated; the most complex count query we do is against different values in top level document attributes.

jdoliner commented 11 years ago

I'm a bit confused about what you mean by: "the most complex count query we do is against different values in top level document attributes" would you mind clarifying?

jdoliner commented 11 years ago

I'm a bit confused about what you mean by: "the most complex count query we do is against different values in top level document attributes" would you mind clarifying?

dbettin commented 11 years ago

Sure. We have a requirement to understand the number of documents which match certain states. The table contains a "status" field which the count query would use.

Ex:

r.table('table').count(r.row['status'].contains('deployed')).run(conn)

jdoliner commented 11 years ago

So in order to get that type of query working efficiently we would need to implement #1118. I'm actually starting to think we should just do that for 1.9.

coffeemug commented 11 years ago

I feel like #1118 might be a bit of an overkill here for a few reasons:

An alternative solution might be to implement constant time count for two cases:

This seems relatively easy to do, and the way to speed up the type of queries mentioned by @dbettin (and many other queries) is to just create a secondary index, and do count on that.

@jdoliner -- would this be hard? Is this schedulable for 1.9?

Also, I'm going to go to CVS and buy a surgical mask so I don't get people sick, so we may be able to hash this out in person.

coffeemug commented 11 years ago

On second thought what I said may not actually work (since counting the total number of docs in an index doesn't help with @dbettin's query).

coffeemug commented 11 years ago

And by that I mean it could work, but would require some gymnastics during index creation on the part of the user.

jdoliner commented 11 years ago

Having r.table('foo').between(null, null, index='bar').count() be efficient is probably several orders of magnitude harder than implementing incremental map reduce. I say probably because we've spent a long time trying to figure out a way to do it and don't even have a really viable idea. Furthermore everything semi viable looks like it would require massive additions to the btree structure. LMR (live map reduce) is a bit of work with all of the UI stuff but I think it's very doable for 1.9. I say with decent certainty that it's the only viable way to support queries like this efficiently before 2.0. I think the UI also winds up being a lot nicer because you'd write

>>> r.table('table').count(r.row['status'].contains('deployed')).run(conn,
cache=True)
{"created" : 1}
>>> r.table('table').count(r.row['status'].contains('deployed')).run(conn)
# runs in constant time
5

Rather than:

>>> r.table('table').index_create("contains_deployed",
r.row['status'].contains('deployed')).run(conn)
{"created" : 1}
>>> r.table('table').between(True, True,
index="contains_deployed").count().run(conn)

Which seems a bit less intuitive to me.

Also having to create an index to do an efficient count is very memory and disk inefficient. Indexes require an entire btree for the data which is going to be a constant factor of the number of rows in the table whereas an LMR only requires space proportional to the size of the output which in this case is an integer.

On Tue, Aug 13, 2013 at 11:52 AM, coffeemug notifications@github.comwrote:

And by that I mean it could work, but would require some gymnastics during index creation on the part of the user.

— Reply to this email directly or view it on GitHubhttps://github.com/rethinkdb/rethinkdb/issues/1271#issuecomment-22588336 .

srh commented 11 years ago

Having r.table('foo').between(null, null, index='bar').count() be efficient is probably several orders of magnitude harder than implementing incremental map reduce.

We can implement it by storing subtree rowcount estimates with upper and lower bounds on a per-node basis in memory, storing exact subtree rowcount values in the LBA (alongside the repli_timestamp_t value), and having a count query block until the count estimates converge.

jdoliner commented 11 years ago

Besides when there's no operations in the tree when will the row counts converge? How can this be used to compute r.table('foo').filter(...).count() efficiently?

srh commented 11 years ago

You have to block incoming operations on the tree, by holding the lock on the high-level blocks covering the range you're getting the count for. Or, hypothetically, you could let the query get "pushed down" by other queries (if we had the other kind of snapshotting, we would already have code that does this). But without that, a count query just has to block other queries and let any transactions beneath it drain.

srh commented 11 years ago

Well, actually, you don't have to block other queries, if you're willing to make a snapshot. And we'd need to update snapshot blocks' estimates of subtree query counts, unless the way we take snapshots also involves draining transactions. I don't know exactly how we make snapshots.

jdoliner commented 11 years ago

Yeah, I really don't like the idea of blocking the entire tree just to do a count. And making a snapshot and then updating block counts within the snapshot (but not block contents) seems really questionable to me. I also think that technique really sucks if you're doing the types of queries that @dbettin is describing here. The query he had was:

r.table('table').count(r.row['status'].contains('deployed')).run(conn)

which means already he has to create an entire index just to efficiently do this count but looking at the data it seems as if the "status" value is an array of tags and he might well want to count those too. So if he also wants to efficiently do:

r.table('table').count(r.row['status'].contains('deployed')).run(conn)
r.table('table').count(r.row['status'].contains('spawned')).run(conn)
r.table('table').count(r.row['status'].contains('crashed')).run(conn)
r.table('table').count(r.row['status'].contains('running')).run(conn)

he's going to need a different index for each one of those which is going to get to be a big overhead really quickly. Also with this method there's no way to efficiently do something like:

r.table('table').concat_map(r.row["articles"]).count()

but this is very easy to do with LMR.

srh commented 11 years ago

What we're going to do specifically for this issue, specifically related to .count() and some other count operations involving range queries (that don't have to manually apply filter functions to every value) is to improve the behavior of count() by not having it load the entire document off of disk before adding 1 to an accumulator. This would help the performance a lot.

coffeemug commented 11 years ago

@srh -- did you confirm that it loads the entire document?

Assuming you did, while this improvement would be really awesome and we should by all means do it, I don't think it would help with @dbettin's use case because the moment there is a filter before the count, we have to load the document into RAM.

srh commented 11 years ago

@coffeemug: yes.

A different fix can be added to improve the filtered count query, or any query, really -- loading documents in parallel from leaf nodes instead of iterating sequentially.

srh commented 11 years ago

Removing myself as assignee. I'm the assignee on #1295 now. I don't know if I'll be working on #1296.

coffeemug commented 11 years ago

Thanks @srh, I'll talk to @jdoliner and may be @mlucy when 1.8 is out and we'll see what we want to do about this.

srh commented 11 years ago

1295 is now done, so at least .count() will be zippy. .count(function (row) { ... }) use-cases, and other operations that don't benefit from indexes, would benefit from #1296 being fixed.

danielmewes commented 10 years ago

Count is still not really fast. Doing a count() on a large table (couple million documents) takes a lot of CPU and a long time to finish. In the range of 30 minutes for ~100M documents.

mlucy commented 10 years ago

@danielmewes -- see my comment at https://github.com/rethinkdb/rethinkdb/issues/1733#issuecomment-30391857 .

wojons commented 10 years ago

@danielmewes in regards to you seeing it take a long time and using a lot of cpu i can confirm that i was doing some testing on some stuff last night. I had the db in --no-direct-io. I had no disk usage (reading from cache), but it still took about 30sec to count 1million objects and noticed i used about 30-40% cpu and 80-90% of that was in user space the rest was in system calls.

coffeemug commented 10 years ago

@wojons what's the output of rethinkdb --version on the test machine? There was a regression for count speed on a table which was fixed in 1.11.3.

wojons commented 10 years ago

@coffeemug spoke with Daniel today he explained.

minibikini commented 10 years ago

r.table('table').count() takes 15s to count table with ~300k records. Is there anything I can do to speed it up?

rethinkdb 1.11.3 (CLANG 5.0 (clang-500.2.79))

coffeemug commented 10 years ago

@minibikini -- a couple of questions:

We had a regression recently of count being slow (which was fixed in 1.11.3) and separate reports of count being slow on OS X. I'd like to get to the bottom of OS X performance, but we first have to validate that you're encountering an OS X specific issue.

Thanks for reporting and helping us get to the bottom of this!

wojons commented 10 years ago

@coffeemug i was just about to ask the exact questions you asked.

@minibikini from what i remember a while ago is that the OSX versions were 32bit only and has somethings that it effected in performance. If your still awake you can jump in the IRC and i can help you work this out i will be under "wojons" or "lexi2"

coffeemug commented 10 years ago

Hmm, I'm pretty sure the OS X build is 64 bit.

wojons commented 10 years ago

@coffeemug i could be totally wrong i am just trying to remeber something from @jdoliner said during a conference talk and i feel like he said it was 32bit which was why it ran slower on osx.

wojons commented 10 years ago

@coffeemug i am sure its 64bit but something about this pops up that during the demo was the reason the count was slow on his laptop which was also ssd.

minibikini commented 10 years ago

@coffeemug, @wojons - thanks for the replies!

  1. OSX 10.9.1, 16Gb RAM
  2. My disk is "Fusion Drive" (128ssd+320hdd), which is a kind of RAID0.
  3. It gets faster after the first run, but still it's about 7 sec.
  4. Not yet, will try today.
wojons commented 10 years ago

@coffeemug http://www.youtube.com/watch?v=3038H1Ui_HE between 24:35 - 24:50

coffeemug commented 10 years ago

@minibikini -- ok, thanks! Please let me know what you get on Linux. We'll try to do our own tests shortly.

@wojons -- ah, thanks! I'm not entirely sure if Joe misspoke or is right. I'll find out.

wojons commented 10 years ago

@minibikini when i run

r.db('').table('').limit(300000).count()

on pure hdd it takes 49.00sec after a few runs.

@coffeemug i am not sure just remebered that when he brought up the question.

coffeemug commented 10 years ago

@wojons -- I'm not entirely sure about this, but I think a count after a limit is implemented very differently from a pure count. With a limit it has to process every document, but not when it's run a table directly.

wojons commented 10 years ago

@coffeemug yeah i am seeing that i am going to use that to import 300k docs into anotehr table and then will get back to you guys shortly.

wojons commented 10 years ago

@coffeemug & @minibikini

This is run on a build with the count performance regression: RethinkDB 1.11.2-223-g8ca1a4

Cached --no-direct-io

r.table('300k').count()

5.33s round-trip time 5.13s server time

Running normal config cold cache When i tried this it simple would look the server with a high load due to iowait

minibikini commented 10 years ago

@coffeemug, @wojons:

Just tried r.table('table').count() for same data set on Ubuntu 13.10, Digital Ocean 2gb, rethinkdb with default config -- I get 450ms -- 1s server time. Much better than OSX but still quite slow :-(

wojons commented 10 years ago

@minibikini yeah that sounds much closer to what it should be. I am guessing it has to do with OSX. Considering how you are using 1.11.3 on both which includes the patch I dont know what would have happened if you tried it before the batch on OSX since it locks my linux server. Now the only differance i can think about is that your fusion drive setup is not working with rethinkdb correctly.I am not sure if its getting into the ssd cache at all since rethinkdb uses its own io subsystem by default which may be by passing the caching system.

If you run it with the --no-direct-io flag it will end up in pure memory which we can try to compare if the issue is OSX or Fusion but could be other issues in the way.

nickpoorman commented 10 years ago

+1

I'm on OSX as well and when I run count() on a 6 million document table my CPU spikes to 100% across all cores and never finishes, or it doesn't appear to anyway.

neumino commented 10 years ago

@nickpoorman -- do you run count on a table? or after a filter?

nickpoorman commented 10 years ago

r.db('ml').table('sample_compounds').count()

And it did finish after 1 minute and 33.17 seconds. However I had to restart the rethinkdb server first (after inserting all the documents).

nickpoorman commented 10 years ago

Ahh, the Date Explorer returns an error:

RqlRuntimeError: Query interrupted.  Did you shut down the server? in:
r.db("ml").table("sample_compounds").count()

However, the query seems to continue on the server... screenshot 2014-07-01 15 22 21