Closed trengrj closed 6 years ago
Thanks for bringing this up and sorry for the late response. Indeed, COUNT(*) is very slow on large data sets so having a hybrid approach would solve the issue (partially). Ideally, we could first check the estimated rows count and if its below 100k (just an example) rows we use COUNT, otherwise use the estimated rows count for pagination. I would also prefer keeping the pagination where possible instead of removing it altogether, i know this could be a tricky feature, but still.
It's also worth noting that OFFSET pagination will get slower the further you paginate, because postgres has to first fetch all rows preceding the offset and throw them away. A faster way is to use keyset pagination, something like WHERE id > (last row id from previous page) LIMIT n
. Not sure if that's practical to implement here, because it requires knowledge of the underlying table.
Also see We need tool support for keyset pagination for more info.
Does anyone have any interest in tackling the problem?
Coming back to this. So im thinking of having a map (per established connection) that will hold information about tables that are somewhat large (> 100k rows). For table browse requests we will first check if that map contains the table and if it does we will fetch the rows stats using count estimates. On database switch that map will be cleared out.
Any thoughts?
The DZone article Faster PostgreSQL Counting has some useful tips on getting good estimate counts on large tables.
Instead of keeping a map of large tables in memory you could just look at the table stats in postgres on the fly, I don't think caching that info is worth the trouble.
When a table is viewed the function
TableRowsCount
is called to get the size of the table for pagination. This unfortunately is a very slow operation on large tables as it callsSELECT COUNT(1) FROM TABLE
which is slow in Postgres and will cause the application to hang on tables in the many gigabytes range.I've got a working commit here which uses the estimated table size (if it is available). I removed the "Page X of Y" at top right and replaced with just "Page X" because sometimes estimated size is not available (i.e. when it is a view).
https://github.com/trengrj/pgweb/commit/b8cc628e4190f5c1c7d43cf776a4e3f5146e3f56
I think it is important to be able to preview any table quickly, but agree the pagination is useful. What are you views on fixing this issue? Maybe some sort of hybrid approach?