man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.05k stars 583 forks source link

[Question] - how to design data (store_type/chunk_size) #969

Closed jemshit closed 2 years ago

jemshit commented 2 years ago

I couldn't find any arctic related Stackoverflow tag, so i'm asking here.

I read through whole documentation, it is still not clear for me which store type should i use, how to chunk it... Requirements:

Questions:

  1. There can be different versions of data for same symbol (Exchange1-SymbolA-Spot, Exchange1-SymbolA-Perpetual, Exchange2-SymbolA-Perpetual). From arctic library viewpoint, which should be library ? ("exchange1" or "exchange1-symbolA" or "exchange1-sumbolA-spot" ?)
  2. Is VersionStore more suitible for OHLCV data or ChunkStore? I want to read data in such way: "read minutely candles for last x minutes/hours/days", "read hourly candles for last x hours/days/weeks/months", "read daily candles for last x weeks/months/years".
  3. If i use ChunkStore, how do i define chunk_size according to above read scenarios? i couldn't find more info on documentation
jemshit commented 2 years ago

Anyone?

luongjames8 commented 2 years ago

I agree there could be a lot more practical examples of how people use arctic for getting started.

To your questions: (1) I've found that its easier to keep all data of the same timeframe together. For me, at the most basic level, one library for all daily, and another for intraday. (Not sure why you would need to keep hourly if you keep minute data, as you an always resample higher.) And I generally keep all from the same source together.

So in your nomenclature: exchange1-intraday, exchange2-intraday, exchange1-daily, exchange2-daily.

(2) Frankly, I've never used chunkstore. The default store is versionstore, which I uses for all OHLC data.. The versioning IMO can come in handy in the event the data is gets messed up... which eventually happens.

(3) Again, never used chunkstore.

jemshit commented 2 years ago

Thank you.

Could you put some lights on this ? @dunckerr @bmoscon

bmoscon commented 2 years ago

https://github.com/man-group/arctic/blob/master/docs/chunkstore.md

jemshit commented 2 years ago

I read it, hence the questions

jemshit commented 2 years ago

So far, this is my summary:

  1. Library is used for bucketing, and it consists of multiple collections in mondodb. So options are: a) "exchange1-symbolA" is library, "spot-minute", "spot-hour",.. are symbols. b) "exchange1-minute" is library, "symbolA-spot", "symbolA-perp",.. are symbols

  2. I still don't understand motivation of chunkstore fully, but according to "chunkstore is super dependent your chunksize, and writing is slower than reading, unless you have a specific reason to use it, you probably want to use versionstore", versionStore seems go to solution.

  3. Not sure, just few quotes:

    it is the minimum amount of data that you have to read if you're only reading a subset of the data, but needs to be big enough that the compression is effective.

ideal for use cases when very large datasets need to be accessed by 'chunk'