man-group / arctic

High performance datastore for time series and tick data
https://arctic.readthedocs.io/en/latest/
GNU Lesser General Public License v2.1
3.05k stars 583 forks source link

Question about Arctic's speed #852

Closed rbdm-qnt closed 3 years ago

rbdm-qnt commented 4 years ago

Hi!

Not really an issue, more of a question. I'm looking into Arctic, currently using InfluxDB. I work with financial tick data, and need to run big queries to process and study the data I have. I store rows with 7 fields, the number on the left is the amount of rows per query, the number on the right is the query time in hours, minutes and seconds.

4731436, 1:00:28 5668090, 1:12:59 9812999, 2:06:23 11417387, 2:31:08 9480222, 2:07:46 12839124, 3:06:02 17256737, 4:19:54

Can Arctic query faster than this? And how does it handle big queries? Influx loads the entire query in RAM, so you have to split a big query into like 50 smaller ones, iterating through it in your code. Does Arctic handle it better?

I'm running Python 3.7 on MacOs 10.12.6 (macbook with 16GB ram)

Thanks!

jzay commented 4 years ago

Hi, I’m working on a similar problem with the same system specs. Let me know if you’d like to collaborate. Also trying to decide between using this and influxDB

rbdm-qnt commented 4 years ago

Hi, I’m working on a similar problem with the same system specs. Let me know if you’d like to collaborate. Also trying to decide between using this and influxDB

Sure! Let's hope somebody answers this. I can help you with influx if you want

jzay commented 4 years ago

That would be great, how would you like to communicate?

I was going to at least download and learn how to use influx because it seems interesting and pretty useful.

I'm working with crypto exchanges pulling tick data from this repo.

bmoscon commented 4 years ago

its definitely faster than this, but you'll need a mongodb cluster to get real performance from arctic.

rbdm-qnt commented 4 years ago

its definitely faster than this, but you'll need a mongodb cluster to get real performance from arctic.

Thank Bryant, I'll look into it!

jamesblackburn commented 4 years ago

the number on the left is the amount of rows per query, the number on the right is the query time in hours, minutes and seconds.

4731436, 1:00:28

Arctic python should be able to read at 1-4 million rows per second. So your query above should take a few seconds and should be doable in one go!

The difference between arctic and other stores is that we store and retrieve the data in compressed columnar form. So there's no aggregation of maths you can do in the mongo database. Instead we are optimised to move data to the clients for computation using pandas, numpy and other numeric libraries there.

rbdm-qnt commented 4 years ago

Wow! That sounds like a dream in comparison for my application. So I'm assuming arctic also doesn't load the entire query onto the RAM?

jamesblackburn commented 4 years ago

It loads as much data as you request (by specifying a date_range= on the query). Generally you would query for an appropriate amount of data based on the memory on your client, and the sampling frequency of the data.

shashank88 commented 3 years ago

Closing due to inactivity.