Closed rbdm-qnt closed 3 years ago
Hi, I’m working on a similar problem with the same system specs. Let me know if you’d like to collaborate. Also trying to decide between using this and influxDB
Hi, I’m working on a similar problem with the same system specs. Let me know if you’d like to collaborate. Also trying to decide between using this and influxDB
Sure! Let's hope somebody answers this. I can help you with influx if you want
That would be great, how would you like to communicate?
I was going to at least download and learn how to use influx because it seems interesting and pretty useful.
I'm working with crypto exchanges pulling tick data from this repo.
its definitely faster than this, but you'll need a mongodb cluster to get real performance from arctic.
its definitely faster than this, but you'll need a mongodb cluster to get real performance from arctic.
Thank Bryant, I'll look into it!
the number on the left is the amount of rows per query, the number on the right is the query time in hours, minutes and seconds.
4731436, 1:00:28
Arctic python should be able to read at 1-4 million rows per second. So your query above should take a few seconds and should be doable in one go!
The difference between arctic and other stores is that we store and retrieve the data in compressed columnar form. So there's no aggregation of maths you can do in the mongo database. Instead we are optimised to move data to the clients for computation using pandas, numpy and other numeric libraries there.
Wow! That sounds like a dream in comparison for my application. So I'm assuming arctic also doesn't load the entire query onto the RAM?
It loads as much data as you request (by specifying a date_range=
on the query). Generally you would query for an appropriate amount of data based on the memory on your client, and the sampling frequency of the data.
Closing due to inactivity.
Hi!
Not really an issue, more of a question. I'm looking into Arctic, currently using InfluxDB. I work with financial tick data, and need to run big queries to process and study the data I have. I store rows with 7 fields, the number on the left is the amount of rows per query, the number on the right is the query time in hours, minutes and seconds.
4731436, 1:00:28 5668090, 1:12:59 9812999, 2:06:23 11417387, 2:31:08 9480222, 2:07:46 12839124, 3:06:02 17256737, 4:19:54
Can Arctic query faster than this? And how does it handle big queries? Influx loads the entire query in RAM, so you have to split a big query into like 50 smaller ones, iterating through it in your code. Does Arctic handle it better?
I'm running Python 3.7 on MacOs 10.12.6 (macbook with 16GB ram)
Thanks!