man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.06k stars 583 forks source link

Pymongo 3.11.0 avoid slow mongo_count #926

Closed dunckerr closed 2 years ago

dunckerr commented 2 years ago

I rebased this branch to pull in the fixed tests. The actual issue this PR fixes in arctic._util.mongo_count() in _util.py which used to run very slowly with filter == {}, and pymongo 3.11.0.

dunckerr commented 2 years ago

I rebased this branch to pull in the fixed tests. The actual issue this PR fixes in arctic._util.mongo_count() in _util.py which used to run very slowly with filter == {}, and pymongo 3.11.0.

dunckerr commented 2 years ago

@shashank88 @bmoscon @jamesblackburn can you guys check this please? Note I rebased master to pull in all the py3.6 test changes. The actual change here is in arctic/_util.py, where we avoid calling count_documents() with filter={} and pymongo drivers > 3.6.0. That scenario was killing the database with COLLSCANs when the caller really just wanted to know how many docs in the collection. The new code path calls estimate_document_count() which is super fast.