Collection count slow - Githubissues

In the constructor for SQLAResultCollection, the instance variable _count is set by issuing a call to .count() on the specced query object for the given resource.

The problem is - at least with MySQL - if you have rather a large number of rows in the table, regardless of the filter spec applied, .count takes a loooong time to return before the code moves on to issue the query for the actual collection.

I made the following change

# self._count = self._query.count()
self._count = query.with_entities(func.count(1)).filter(*self._spec).scalar()

And the response time for a massive collection improved dramatically (in my example, the spec I'm filtering on returns 1,534,781 rows. Using _query.count() I get an average collection endpoint response time of ~19000 ms for each request. With the query.with_entities() call, the response time drops to roughly ~700 ms.

Some anecdotal output from running explain on the two different queries:

query.with_entities()

mysql> explain select count(1) from tablet.customers c where c.site_id = 3;
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows    | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-------------+
|  1 | SIMPLE      | c     | ref  | site_id       | site_id | 4       | const | 1493791 | Using index |
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

query.count()

mysql> explain select count(c.customer_id) from tablet.customers c where c.site_id = 3;
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | c     | ALL  | site_id       | NULL | NULL    | NULL | 1906190 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

I cut down the number of fields that were actually queried by SQLA for the second example, but the second is close enough and indicative enough to show that the culprit is actually down to the query ultimately using an index vs just a where clause.

I'm not suggesting my late-night bleary-eyed hack will always work - I wasn't sure what I had to play with inside the constructor, so the with_entities approach may not be the best or fastest, but if it's possible to use an entirely separate query instance (with the same filter spec) just for the count, it may help.

Thanks for making this in the first place - so far it's awesome :D

pyeve / eve-sqlalchemy

Collection count slow #104