man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.06k stars 584 forks source link

Util to get data size in mongo per symbol #803

Open shashank88 opened 5 years ago

shashank88 commented 5 years ago

It's a fairly common request to get the size of data for each symbol in VersionStore and currently I just use a Mongo js script to get it, but it would be nice to have a util in VersionStore or otherwise to do the same.

shashank88 commented 5 years ago

Was using this to play with motor and asyncio, it's actually not bad. Need to plug into the auth hook as motor only supports the uri with auth in it.

import asyncio
import logging
import bson
import six
from motor.motor_asyncio import AsyncIOMotorClient

logger = logging.getLogger(__name__)

async def get_size_sym(client, sym):
    return sym, sum([len(bson.BSON.encode(document)) async for document in client.find({'symbol': sym})])

async def get_symbol_sizes(db_name, coll_name):
    if six.PY2:
        logger.error("This function is for py3 only")
        return

    # TODO: get params from arctic
    mc = AsyncIOMotorClient()

    coll = mc[db_name][coll_name]
    symbols = ['a', 'b', 'c']
    res = await asyncio.gather(*[get_size_sym(coll, symbol) for symbol in symbols])
    ret = {r[0]: r[1] for r in res}
    print(ret)

loop = asyncio.get_event_loop()
loop.run_until_complete(get_symbol_sizes('arctic_skhare', 'test2'))

{'a': 49529, 'b': 49529, 'c': 230}