hdf5 storage for blotter

ranaroussi / qtpylib

QTPyLib, Pythonic Algorithmic Trading

http://qtpylib.io

Apache License 2.0

2.13k stars 508 forks source link

hdf5 storage for blotter #20

Open post2web opened 7 years ago

post2web commented 7 years ago

At this time mysql is very tightly integrated into the blotter. Do you have plans supporting other storage options as hdf5? I just tried to insert 11 months of 1min bars (≈40M) from S&P500 into mysql and it took the whole day VS in hdf5 takes minutes. Since you only have one blotter and nothing else communicating with the DB using a server seems overkill.

jkleint commented 7 years ago

I noticed the blotter issues a commit with every row written; perhaps there is some way to autocommit every N seconds or N rows instead to alleviate the immediate problem.

I agree a database seems a bit heavyweight of a solution, since the access patterns are write-once, append only, and sequential reads; I don't think we need ACID or transactions or multiple writers or indexing or complex queries. Personally I'd be happy with a system that writes to a directory of CSV files (one per instrument). You could adapt such a system to HDF5 or Blaze or something more efficient.

ranaroussi commented 7 years ago

MySQL was never intended to be the final solution - it was simply the quickest way to get this done. I've actually thinking of maybe replacing it with Man AHL's Arctic to allow for easier management of data clusters if needs be, which is something I'm not sure is possible with flat file formats (HDF5, CSV, etc). I will take a look at the Blaze ecosystem, tho.

rterbush commented 7 years ago

I think a database abstraction would be a great upgrade here. FWIW, ran across the following info which might be of interest in this discussion.

https://docs.google.com/spreadsheets/d/1sMQe9oOKhMhIVw9WmuCEWdPtAoccJ4a-IuZv4fXDHxM/edit#gid=0

Using something like InfluxDB, it would enable using Grafana as a metrics dashboard which might allow offloading some of the development tasks of the dashboard to other tools.

sabman commented 7 years ago

Take a look at http://www.timescale.com/ as well

electricmomo commented 5 years ago

may I suggest more decoupling. Create a process called journaling or whatever which will listen to some ZMQ topic and write records to db. This will certainly alleviate "slow db" issue, at least for writing. As for reading, the less I/O for a strategy, the better