pyeve / eve-sqlalchemy

SQLAlchemy data layer for Eve-powered RESTful APIs
http://eve-sqlalchemy.readthedocs.io
Other
232 stars 70 forks source link

Collection count slow #104

Closed frostiebot closed 8 years ago

frostiebot commented 8 years ago

In the constructor for SQLAResultCollection, the instance variable _count is set by issuing a call to .count() on the specced query object for the given resource.

The problem is - at least with MySQL - if you have rather a large number of rows in the table, regardless of the filter spec applied, .count takes a loooong time to return before the code moves on to issue the query for the actual collection.

I made the following change

# self._count = self._query.count()
self._count = query.with_entities(func.count(1)).filter(*self._spec).scalar()

And the response time for a massive collection improved dramatically (in my example, the spec I'm filtering on returns 1,534,781 rows. Using _query.count() I get an average collection endpoint response time of ~19000 ms for each request. With the query.with_entities() call, the response time drops to roughly ~700 ms.

Some anecdotal output from running explain on the two different queries:

query.with_entities()

mysql> explain select count(1) from tablet.customers c where c.site_id = 3;
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key     | key_len | ref   | rows    | Extra       |
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-------------+
|  1 | SIMPLE      | c     | ref  | site_id       | site_id | 4       | const | 1493791 | Using index |
+----+-------------+-------+------+---------------+---------+---------+-------+---------+-------------+
1 row in set (0.00 sec)

query.count()

mysql> explain select count(c.customer_id) from tablet.customers c where c.site_id = 3;
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | c     | ALL  | site_id       | NULL | NULL    | NULL | 1906190 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.00 sec)

I cut down the number of fields that were actually queried by SQLA for the second example, but the second is close enough and indicative enough to show that the culprit is actually down to the query ultimately using an index vs just a where clause.

I'm not suggesting my late-night bleary-eyed hack will always work - I wasn't sure what I had to play with inside the constructor, so the with_entities approach may not be the best or fastest, but if it's possible to use an entirely separate query instance (with the same filter spec) just for the count, it may help.

Thanks for making this in the first place - so far it's awesome :D

frostiebot commented 8 years ago

You can totally ignore all this - late night delusions of a man going slowly insane.

It's really a problem with the crummy tables I have to work with (MyISAM, "weird" indexes and such) - If my predecessors knew what a database was, a simple filtered count wouldn't take twenty seconds.

It really wouldn't matter if you altered the code to use the with_entities pattern - after more experimentation I found that adding on another column to filter on caused the execution time to rocket back to 20 seconds again, so yeah, my problems are self-inflicted.

Sorry for rambling and/or wasting time :)