Closed michaeljohnbennett closed 2 years ago
Hi, we do have a Java implementation internally (TickStore reader/writer and VersionStore ndarray reader) that we'd like to open-source, but it needs a bit more work to disentangle it from our internal libraries and build infrastructure.
Are you using TickStore or VersionStore, and what sort of interface would you want to the data, tick-by-tick, or something DataFrame-like? Sorry I can't be more specific on the timing.
Hi Richard,
It would be a version store reader. DataFrame-Like would be fantastic.
I'm just getting to grips with version store and working through the example (done) and now looking at loading in data.
Do you have any best practices for loading in data as:
Really looking forward to learning and using this system a bit more.
Loading data from csv via Pandas is fine - you might just want to check that Pandas gets the right datatypes for the columns when you read the csv.
For appending new rows, library.append() should do what you want, e.g.: https://github.com/manahl/arctic/blob/master/tests/integration/store/test_pandas_store.py#L369
Thanks @richardbounds thats really helpful, I'll keep on going with the python side of things.
I will be interested to see how you can get Java and Python interfaces to work on the same dataset.
I'll get some prototype code working to load it in, I already have a loader code for CSI Data to dataframe with correct DTypes to a MySQL/SqlAlchemy loader so just need to create a new loader impl for ARCTIC.
Hi @richardbounds that worked a treat, now onto looking at querying the data.
Is there a way in the API (sorry noob question) where I could just return a subset of the data (I know we can do timeseries) but also just certain columns?
Sometimes I need a data frame that might be a number of symbols and I want to get just the close for them to do some analysis like a universe selection/filtering for a TAA system I'm working on, the dataframe store seems really useful but having it stored in binary format doesn't make it easy to look at it or query it in some other way.
Any ideas/suggestions?
TickStore stores columns separately so you can query them individually, but our existing VersionStore pandas implementation stores all the columns packed together, so you need to just retrieve them all and slice out the ones you want - for EOD data with a handful of columns it should be only a few milliseconds to retrieve the whole thing anyway.
For systems which work with a single column (e.g. a close price) across hundreds or thousands of symbols at once, we tend to build and store a derived dataframe with just that column for the whole universe together in one symbol, e.g. something like 'US_EQUITY_CLOSE_PRICES'.
I want to share that I'm writing a scala adapter for arctic, and will put it on github when it's a bit more mature. I use it as a tick-logger, and for simulation, so the API is somewhat different from the python one, i.e. it's designed for streaming asynchronously, tick-by-tick, rather than returning a block of data. Currently it's pretty basic: TickStore-only with no support for bitmask or reading particular columns, etc., but it works. I'll post here when it's in a presentable state.
It would be great to have TickStore API available in Java to be able to collect tick data from InteractiveBrokers whose API doesn't officially support Python.
Don't know if this helps at all, but I recently published a scala driver that supports TickStore
. It has limitations though - the mongodb driver it uses is written from scratch and currently only supports single-server architecture and no authentication.
@lJoublanc I'll hopefully have time over Christmas to open source the java impl we use internally. But great to see a Scala implementation in the open.
cc @richardbounds
I would love for there to be an open source version that uses the official MongoDB driver! What's the progress @yschimke ?
@thesmartwon Thanks for the prod, I spoke 2 weeks ago with @jamesblackburn about this work.
It's close to ready for an alpha. It would be generally an extraction of our internal Java version but trimmed down to the core functionality, i.e. to start with we won't include our Kafka to Tickstore functionality. Rather than blocking, I'll get that out in current shape (in next week) and we can iterate on it, I can add you as a collaborator. There are a lot of things to improve in it, e.g. auth and dependencies on an internal SEDA library.
FWIW if I'm spending any personal time on it, I'll probably focus my own efforts on getting a second more modern kotlin implementation using reactive streams out. But that's a secondary target that will steal liberally from the first. But for me is a lot more interesting.
Awesome! Can I be added as a contributor @yschimke ?
Has there been any progress over the past month @yschimke ?
Unfortunately the work to open source the Java version of this has been down-prioritised and isn't currently in progress.
legacy issue. Java Version will not be provided in the near future
Hi there,
I have been trialling this out and it looks a great framework for storage retrieval, is there any plans for the Java API or are you looking for contributors to help? I'm testing this out for data storage for zipline/quantopian backtester and also for a JVM based project and was wondering what stage that was at if any?